UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL CENTRO DE BIOTECNOLOGIA PROGRAMA DE PÓS-GRADUAÇÃO EM BIOLOGIA CELULAR E MOLECULAR Marcelo Depólo Polêto Acesso à flexibilidade molecular de ligantes como estratégia de prospecção de interações fármaco-receptor Porto Alegre 2019 Marcelo Depólo Polêto Acesso à flexibilidade molecular de ligantes como estratégia de prospecção de interações fármaco-receptor Tese submetida ao Programa de PósGraduação em Biologia Celular e Molecular do Centro de Biotecnologia da Universidade Federal do Rio Grande do Sul como parte dos requisitos necessários para a obtenção do grau de Doutor em Biologia Celular e Molecular. Orientador: Hugo Verli Porto Alegre 2019 Polêto, Marcelo Depólo Acesso à flexibilidade molecular de ligantes como estratégia de prospecção de interações fármaco-receptor/ Marcelo Depólo Polêto. – Porto Alegre, 2019261 f. Orientador: Hugo Verli Tese (Doutorado) – Universidade Federal do Rio Grande do Sul, Centro de Biotecnologia do Estado do Rio Grande do Sul, Programa de Pós-Graduação em Biologia Celular e Molecular, Porto Alegre, BR-RS, 2019. 1, Química Medicinal. 2, Planejamento de Fármacos 3, Dinâmica Molecular. 4, GROMOS I. Verli, Hugo, orient. II. Título O presente trabalho foi realizado com apoio da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior Brasil (CAPES) - Código de Financiamento 001 Marcelo Depólo Polêto Acesso à flexibilidade molecular de ligantes como estratégia de prospecção de interações fármaco-receptor Tese submetida ao Programa de PósGraduação em Biologia Celular e Molecular do Centro de Biotecnologia da Universidade Federal do Rio Grande do Sul como parte dos requisitos necessários para a obtenção do grau de Doutor em Biologia Celular e Molecular. Banca Examinadora: Prof. Hugo Verli Orientador Prof. Márcio Dorn Instituto de Informática - UFRGS Prof. Bruno Araujo Cautiero Horta Instituto de Química - UFRJ Prof. Paulo Augusto Netz Instituto de Química - UFRGS Prof. Guido Lenz (Suplente) Centro de Biotecnologia - UFRGS Porto Alegre, 29 de Março 2019 Este trabalho é dedicado à todos que, mesmo contra todos os obstáculos, buscam na Educação o alicerce de seu próprio destino. Agradecimentos Primeiramente à Deus, pois ainda que o caminho até aqui tenha sido difícil, o amparo incondicional, as inspirações e as bençãos diárias permitiram não só um enorme aprendizado de vida - pelo qual sou extremamente grato - mas também tornou a caminhada mais compreensiva. À minha mãe, Deuseni, pela fonte infinita de segurança, equilíbrio e retidão moral, que moldam e moldarão minha trajetória onde quer que eu vá. Ao meu pai, Luiz, pelas lições de sabedoria e humildade que se sincronizam perfeitamente com os momentos mais importantes da minha vida. Às minhas irmãs, Carla e Paula, por me presentearem tão maravilhosamente com meus sobrinhos, mas também pelo apoio fundamental ao longo dessa jornada. À minha namorada, Emily, por trazer mais diversão, felicidade e completude pra minha vida, e por ser minha cúmplice no desenho e na realização dos nossos sonhos. Sem teu apoio, certamente não estaria aqui. Te amo! À minha família, por todo o apoio ao longo dessa caminhada. Amo todos vocês! Ao professor Hugo, pela orientação dessa tese e pelas lições científicas e de carreira ao longo do caminho. Aos membros da banca examinadora, por aceitarem este convite e disporem do seu tempo e atenção para o engrandecimento desse trabalho. Aos membros da comissão de acompanhamento, Dr Laércio Pol-Fachin e Prof. Paulo Augusto Netz, por acompanharem o desenvolvimento desse trabalho. Ao Centro de Biotecnologia e Programa de Pós-Graduação em Biologia Celular e Molecular, pela oportunidade de realização não só desse trabalho, mas também de outras atividades que permitiram meu crescimento científico e profissional. À secretária do PPG, Silvinha, pelas mil e uma risadas e por toda a ajuda sempre solícita e bem-humorada ao longo do doutorado. À Alexandra Elbakyan, por lembrar que certas lutas, mesmo encarados por uma só pessoa, também podem balançar o status quo e iluminar nossa percepção de mundo. Aos amigxs e agregadxs do Grupo de Bioinformática Estrutural. É difícil escolher algumas palavras pra definir meu agradecimento da forma como eu gostaria. Cada um de vocês tem parte na construção do ser humano que me tornei, com conversas, com exemplos, novas perspectivas e uma enorme parceria, e por isso eu serei eternamente grato à vocês. Obrigado por me acolherem, obrigado por serem o melhor laboratório onde tive o prazer de trabalhar. Daqui, pra sempre: Old GBE 4ever! Agradeço à todos os meus amigos e amigas que acompanharam essa minha jornada, de perto ou de longe, por iluminarem meu caminho com seus risos e abraços. “É necessário o coração em chamas para manter os sonhos aquecidos. Acenda fogueiras.” Sérgio Vaz Resumo O desenvolvimento de novos fármacos é um processo de múltiplas etapas e dispendioso em termos de custo e tempo. Nesse contexto, métodos computacionais são frequentemente empregados para interpretar e guiar ensaios experimentais no intuito de diminuir custos e acelerar o desenvolvimento. Nas etapas iniciais, a otimização de compostos-líderes geralmente inclui o uso de métodos de SAR e QSAR para melhor compreensão da relação entre estrutura e atividade biológica de interesse, baseando-se em propriedades físicoquímicas de ligantes. Contudo, grupos funcionais estatisticamente correlacionados à atividade por SAR ou QSAR podem ser pouco acessíveis à interação com o receptor-alvo. Contornar estas limitações acessando a dinâmica de ligantes em solução constitui-se, contudo, num desafio considerando a dificuldade de obtenção de dados experimentais de resolução atomística em solução ou da disponibilidade de parâmetros de mecânica clássica de alta qualidade para modelos computacionais. Assim, a presente tese busca contribuir na abordagem destas questões através do desenvolvimento, validação e aplicação de: 1) abordagem de parametrização de ligantes em larga escala com foco em soluções biológicas; 2) metodologia de amostragem conformacional e análise de populações conformacionais de ligantes com perspectivas de automatização; 3) avaliação da propensão de interações receptor-ligante de grupos funcionais em solução aquosa. Para responder essas questões, uma metodologia de parametrização baseada na filosofia GROMOS foi desenvolvida, validada e aplicada para geração de topologias de ligantes, as quais foram submetidas à simulações de dinâmica molecular em solvente explícito para obtenção do seus conjuntos conformacionais. O intuito de caracterizar populações conformacionais de forma sistemática levou ao desenvolvimento de uma ferramenta computacional que permite a identificação dos conformeros e de suas abundâncias relativas. Ainda, a metodologia analítica proposta nesta tese foi aplicada a um caso de estudo, proporcionando inferências sobre os mecanismos de reconhecimento molecular (seleção ou indução conformacional). Combinados, os resultados apresentados nessa tese oferecem bases para análise sistemática da dinâmica conformacional de ligantes livres e do papel desempenhado pela presença explícita de solvente, além de permitir inferências sobre a dinâmica de formação de complexos receptor-ligante, o que pode ser utilizado como metodologias de análise no desenvolvimento de novos fármacos. Palavras-chaves: Campos de força. Planejamento de fármacos. Dinâmica Molecular. GROMOS. Abstract The development of new drugs is a multi-step process, which is expensive in terms of cost and time. In this context, computational methods are often used to interpret and guide experimental trials in order to save costs and accelerate the development. In the initial stages, the optimization of lead compounds generally includes the use of SAR and QSAR methods for a better understanding of the relationship between structure and biological activity of interest, which are based on physicochemical properties of ligands. However, functional groups statistically correlated to SAR or QSAR activity may be unaccessible to interactions with the target receptor. To overcome these limitations by accessing the dynamics of ligands in solution is, however, a challenge considering the difficulty of obtaining experimental data at atomistic resolution in solution or the availability of molecular mechanics parameters of high quality for computational models. Thus, the present thesis aims to contribute to the approach of these issues through the development, validation and application of: 1) large scale parametrization protocol for ligands focused on biological solutions; 2) conformational sampling methodology and analysis of conformational populations of ligands with automation perspectives; 3) evaluation of the predisposition of receptor-ligand interactions of functional groups in aqueous solution. To answer these questions, a parameterization methodology based on the GROMOS philosophy was developed, validated and applied for the generation of ligand topologies, which were submitted to molecular dynamics simulations in an explicit solvent to obtain their conformational sets. The aim of characterizing conformational populations in a systematic way led to the development of a computational tool that allows the identification of ligands conformers and their relative abundances. Moreover, the analytical methodology proposed in this thesis was applied to a case study, providing inferences regarding the mechanisms of molecular recognition (selection or conformational induction). Combined, the results presented in this thesis provide a basis for systematic analysis of the conformational dynamics of free ligands in solution and the role played by the explicit presence of solvent, in addition to allowing inferences about the dynamics of formation of receptor-ligand complexes, which can be used as methodologies of analysis in the development of new drugs. Keywords: Force Field, Drug design. Molecular Dynamics. GROMOS. Lista de ilustrações Figura 1 – Esquema do processo de desenvolvimento de novos fármacos. . . . . 20 Figura 2 – Esquema da abordagem SBDD . . . . . . . . . . . . . . . . . . . . 23 Figura 3 – Funil energético de uma biomolécula . . . . . . . . . . . . . . . . . 27 Figura 4 – Escala temporal necessária para transições conformacionais . . . . . 28 Figura 5 – Eventos químicos envolvidos na formação do complexo receptor-ligante 30 Figura 6 – Escalas de tempo e espaço para eventos celulares e métodos de análise adequados. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figura 7 – Fluxograma geral do protocolo metodológico desenvolvido nessa tese 46 Figura 8 – Esquema de geração de cargas atômicas parciais . . . . . . . . . . . 47 Figura 9 – Esquema de simulação de líquidos orgânicos e obtenção de proprie- dades físico-químicas. . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Figura 10 – Esquema de simulação de energia livre de solvatação . . . . . . . . 51 Figura 11 – Esquema representativo de simulações de metadinâmica. . . . . . . 53 Figura 12 – Esquema representativo da ferramenta ConfID. . . . . . . . . . . . 161 Figura 13 – Caracterização conformacional de um flavonóide . . . . . . . . . . . 162 Figura 14 – Esquema exemplificando a ação da ferramenta ConfID . . . . . . . 165 Lista de tabelas Tabela 1 – Prognóstico do mercado farmacêutico para 2021 . . . . . . . . . . . 19 Tabela 2 – Fármacos desenvolvidos utilizando abordagem SBDD . . . . . . . . 22 Tabela 3 – Exemplos de ferramentas usadas para amostragem conformacional de ligantes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Tabela 4 – Desempenho da metodologia de parametrização descrita em Polêto et. al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Lista de abreviaturas e siglas ANVISA DM DSC ESP FDA FEP GL HF HTS ITC LBVS m.a. MP2 MM MQ NOE NPT NVT PCM P&D Agência Nacional de Vigilância Sanitária Dinâmica Molecular Calorimetria diferencial de varredura (differential scanning calorimetry) Superfície de potencial eletrostático (electrostatic potential derived surface) Food and Drug Administration Free energy pertubation Graus de Liberdade Hartree-Fock Triagem automatizada em larga escala (Highthrouput Screening) Calorimetria de titulação isotérmica (isothermal titration calorimetry) Varredura virtual baseada em ligante (ligand-based virtual screening) Massa atômica Møller–Plesset teoria da pertubação de segunda ordem Mecânica Molecular Mecânica Quântica Nuclear Overhauser effect Ensemble Isotérmico-Isobárico (número de partículas, pressão e temperatura constantes) Ensemble Canônico (número de partículas, volume e temperatura constantes) Modelo do Contínuo Polarizável (polarizable continuum model) Pesquisa e Desenvolvimento PDB PME QSAR RMN RMSD RR SAR SD SBDD SBVS TI VS Protein Data Bank Particle Mesh Ewald Relações estrutura-atividade quantitativas quantitative structure activity relationships Ressonância Magnética Nuclear Raiz quadrada do desvio médio (Root mean square deviation) Rotor rígido Relações estrutura-atividade (structure activity relationships) Dinâmica Estocástica (stochastic dynamics) Desenvolvimento de fármaco baseado em estrutura (structure-based drug design) Varredura virtual baseada em estrutura (structure-based virtual screening) Thermodynamical integration Varredura virtual (virtual screening) Lista de símbolos ∆D ∆Hvap ∆Ghyd D ρ ε αp κT Cp kB K Da u Variação de difusão Entalpia de vaporização Energia-livre de solvatação em água Constante de Difusão Densidade Constante dielétrica Coeficiente de expansão térmica Compressibilidade isotérmica Capacidade térmica isobárica Constante de Boltzmann [0,0083 kJ/(mol × K)] Kelvin Dalton Unidade de massa atômica Sumário 1 1.1 1.1.1 1.1.2 1.2 1.2.1 1.2.2 1.3 1.3.1 1.3.2 1.4 1.4.1 1.4.2 1.4.2.1 1.4.2.2 1.5 1.5.1 2 3 4 4.1 4.2 4.3 4.4 4.5 4.6 4.6.1 4.7 4.8 4.9 4.9.1 INTRODUÇÃO . . . . . . . . . . . . . . . . . . . . . . . . . . . Desenvolvimento de fármacos . . . . . . . . . . . . . . . . . . . . Triagem automatizada em larga escala (HTS) . . . . . . . . . . . . . . Abordagem racional . . . . . . . . . . . . . . . . . . . . . . . . . . . . Termodinâmica e espaço conformacional . . . . . . . . . . . . . . Termodinâmica de formação de complexos receptor-ligante . . . . . . . Termodinâmica no desenvolvimento de fármacos . . . . . . . . . . . . Caracterização estrutural de pequenas moléculas . . . . . . . . . Cristalografia e Difração de Raio-X . . . . . . . . . . . . . . . . . . . . Ressonância Magnética Nuclear . . . . . . . . . . . . . . . . . . . . . Campos de Força . . . . . . . . . . . . . . . . . . . . . . . . . . . . A função de potencial e atributos de campos de força . . . . . . . . . . Campos de força de pequenos ligantes . . . . . . . . . . . . . . . . . . Mecânica quântica na parametrização de campos de força . . . . . . . . . Parametrização na Filosofia GROMOS . . . . . . . . . . . . . . . . . . . Amostragem conformacional de ligantes . . . . . . . . . . . . . . Dinâmica molecular na caracterização de ligantes . . . . . . . . . . . . 18 18 20 21 25 27 30 31 31 33 35 35 36 38 38 39 40 JUSTIFICATIVA . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 OBJETIVOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 PROCEDIMENTOS METODOLÓGICOS . . . . . . . . . . . . . Derivação de cargas parciais para MM . . . . . . . . . . . . . . . Least Square Fit Solution . . . . . . . . . . . . . . . . . . . . . . . Construção de topologia . . . . . . . . . . . . . . . . . . . . . . . . Geração de perfis torcionais . . . . . . . . . . . . . . . . . . . . . . Simulação de líquidos orgânicos . . . . . . . . . . . . . . . . . . . Cálculo de propriedades físico-químicas . . . . . . . . . . . . . . . Simulações de energia livre de solvatação . . . . . . . . . . . . . . . . Simulação em Solvente . . . . . . . . . . . . . . . . . . . . . . . . Metadinâmica de torções . . . . . . . . . . . . . . . . . . . . . . . Análises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distribuição diedral ao longo da simulação . . . . . . . . . . . . . . . . 45 45 45 46 47 48 48 50 51 52 52 53 4.9.2 4.9.3 5 5.1 5.2 5.3 5.4 6 6.1 6.2 6.3 7 8 A.1 A.2 A.3 A.4 B.1 B.2 B.3 B.4 Caracterização de populações conformacionais . . . . . . . . . . . . . . 54 Identificação de interações intra e intermoleculares . . . . . . . . . . . 54 RESULTADOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Capítulo I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Capítulo II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Capítulo III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Capítulo IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 DISCUSSÃO GERAL . . . . . . . . . . . . . . . . . . . . . . . . 156 A estratégia de parametrização . . . . . . . . . . . . . . . . . . . 156 Amostragem conformacional e metodologia de análise . . . . . . 159 Dos insights biológicos aos insights termodinâmicos . . . . . . . 163 CONCLUSÕES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 PERSPECTIVAS . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 REFERÊNCIAS . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 ANEXOS 201 ANEXO A – TRABALHOS DESENVOLVIDOS AO LONGO DO DOUTORAMENTO . . . . . . . . . . . . . . . . . 202 Homology modeling and molecular dynamics provide structural insights into tospovirus nucleoprotein . . . . . . . . . . . . . . . . 202 Influence of Na+ and Mg2+ ions on RNA structures studied with molecular dynamics simulations . . . . . . . . . . . . . . . . . . . 210 Dynamics of Membrane-Embedded Lipid-Linked Oligosaccharides for The Three Domains of Life . . . . . . . . . . . . . . . . . 222 Role of structural ions on the dynamics of the Pseudomonas fluorescens 07a metalloprotease . . . . . . . . . . . . . . . . . . . 237 ANEXO B – SCRIPTS . . . . . . . . . . . . . . . . . . . . . . . 245 CSVMaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Least Square Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 ConfID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 VirtualTrajMaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 CURRICULUM VITÆ 258 1 Introdução 18 “You cannot teach a man anything; you can only help him discover it in himself.” Galileo 1.1 Desenvolvimento de fármacos Um fármaco pode ser definido como uma substância que produz um determinado efeito terapêutico no corpo humano [1–3]. Apesar da humanidade experimentar e consumir substâncias com o mesmo objetivo há muitos séculos, foi somente nos últimos 100 anos em que houve um investimento sistemático em Pesquisa e Desenvolvimento (P&D) de fármacos [1,4]. A definição de química medicinal foi cunhada por uma comissão especial da IUPAC como uma área que "tem interesse na descoberta, desenvolvimento, identificação e a interpretação de modos de ação à nível molecular de compostos biologicamente ativos" [5]. Ainda, "também inclui o estudo, identificação e síntese dos produtos metabólicos desses fármacos ou compostos relacionados" [5]. Sendo uma ciência interdisciplinar, a química medicinal pode se situar na interface entre a química orgânica e as ciências da vida, tais como bioquímica, farmacologia, biologia molecular e toxicologia, além de também permear áreas relacionadas à química, tais como físicoquímica, cristalografia, espectroscopia e técnicas de simulações computacionais [5]. Além dos profissionais relacionados às áreas mencionadas, a química medicinal, atualmente, também contempla médicos, cientistas, estatísticos, economistas e advogados, dada a complexidade científica e o dinamismo comercial do mercado [1, 4]. O desenvolvimento de novos fármacos é um processo lento e economicamente dispendioso, e suas estimativas de custo podem chegar - desde a fase de identificação até a sua aprovação - até 2 bilhões de dólares, enquanto o processo como um todo pode levar mais de 15 anos [6–8]. Tanto tempo e investimento se faz necessário para cumprir uma série de critérios metodológicos impostos por agências reguladoras, como a Food and Drug Administration (FDA) nos Estados Unidos ou a Agência Nacional de Vigilância Sanitária (ANVISA) no Brasil. A razão para esse rigor metodológico é uma questão de saúde pública, uma vez que medicamentos precisam ter eficácia e segurança comprovadas antes de serem aprovados para comercialização por agências governamentais [9, 10]. Capítulo 1. Introdução 19 Tabela 1 – Prognóstico do mercado farmacêutico para 2021, realizado e publicado pelo Instituto QuintilesIMS [12]. Global US$ Bilhões (2016) US$ Bilhões (2021) Crescimento 1104,6 1455–1485 4 –7% Desenvolvidos EUA Alemanha Reino Unido Itália França Espanha Japão Canadá Coréia do Sul Austrália 749,3 461,7 43,1 27,0 28,8 32,1 20,7 90,1 19,3 13,0 13,5 975–1005 645–675 49–59 34–38 34–38 33–37 23–27 90–94 27–31 14–18 13–16 4 –7% 6–9% 2–5% 4–7% 1–4% (-1)–2% 1–4% (-1)–2% 2–5% 3–6% 0–3% Emergentes China Brasil Índia Rússia Resto do Mundo 242,9 116,7 26,9 17,4 11,6 112,4 315–345 140–170 32–36 26–30 14–18 130–160 6–9% 5–8% 7–10% 10–13% 5–8% 3–6% Dessa forma, é imperativo que companhias farmacêuticas desenvolvam seus fármacos no menor tempo possível para se manterem competitivas no mercado. O alto custo do processo não está ligado somente ao desenvolvimento bem sucedido de um novo fármaco, mas também associado ao alto custo de compostos que não demonstraram o efeito desejado ou que provocaram reações adversas graves durante as fases clínicas [11]. De acordo com as estatísticas apresentadas por Ng [1], das 5000-10000 moléculas que se mostram promissoras em fases iniciais, apenas 5 são levadas às fases clínicas e somente 1 se torna um fármaco aprovado. De acordo com o recente relatório publicado em 2016 pela Instituto QuintilesIMS [12], a estimativa de vendas totais no mercado farmacêutico global para 2021 é de US$ 1,5 trilhões, e regiões como EUA, China e Europa contabilizam a maior parte das vendas. De acordo com o prognóstico, o mercado farmacêutico brasileiro poderá presenciar um aumento nas vendas de 7-10%, chegando à US$ 32-36 bilhões de dólares (Tabela 1). Tipicamente, o desenvolvimento de fármacos se inicia na pesquisa básica, identificando pequenas moléculas capazes de representar novas entidades químicas com Capítulo 1. Introdução 20 potencial de desenvolvimento clínico [2, 13] (Figura 1). Essa identificação pode se dar através de ensaios biológicos utilizando milhares de compostos ou através do desenho racional de moléculas, utilizando informações moleculares de modelos de interação fármaco-receptor e suas informações estruturais [14, 15]. Nessa fase inicial, centenas ou milhares de compostos podem ser identificados como candidatos. Esses serão levados para ensaios in vitro e in vivo na fase pré-clinica para obtenção de informações sobre suas respectivas potência e segurança e, os candidatos que ainda se mostrarem promissores, são promovidos para a fase clínica, na qual serão sujeitos à testes controlados em humanos [3, 13]. Os candidatos aprovados em todas as etapas anteriores poderão ter sua comercialização pleiteada e, após aprovação das agências reguladoras, o fármaco terá seus efeitos monitorados e documentados ao longo de sua comercialização [1, 14]. Figura 1 – Processo de desenvolvimento de novos fármacos, desde a pesquisa básica até a fase de monitoramento pós-aprovação (modificado de PhRMA [16]). Nesse sentido, a identificação de candidatos em potencial nas fases iniciais é uma etapa determinante para o processo como um todo, impactando tanto na sua taxa de sucesso como seu custo total. Essa identificação de moléculas promissoras pode se dar através de uma varredura sistemática de ensaios biológicos em busca de compostos com atividade desejada, ou em conjunto com abordagens racionais que visam a otimização de determinar as características desejáveis aos compostos. À seguir, falaremos um pouco mais sobre essas duas abordagens. 1.1.1 Triagem automatizada em larga escala (HTS) Pela maior parte do século XX, fármacos foram desenvolvidos usando como base ligantes endógenos, produtos naturais ou fármacos já aprovados com pequenas Capítulo 1. Introdução 21 modificações para aprimorar a sua eficácia em modelos in vivo [1, 14]. Os avanços da bioquímica e da biologia molecular nas décadas de 1970 e 1980 permitiram o alcance de uma melhor relação custo/benefício para ensaios in vitro, o que alimentou uma ambição no campo da Química Medicinal de testar um número cada vez maior de compostos através de triagens automatizadas em larga escala (do inglês highthroughput screening, HTS) [7, 14, 17]. Também é preciso dizer que avanços na área de química combinatória nessa mesma época também impulsionaram a popularização das abordagens HTS [7, 14, 17]. A abordagem se baseia na triagem de dezenas ou centenas de milhares de compostos em ensaios biológicos controlados em busca de moléculas com atividade desejada e não requer conhecimentos estruturais prévios do receptor ou do ligante [11]. Os ensaios biológicos utilizados na triagem podem ser realizados in vivo, utilizando culturas celulares ou teciduais, ou in vitro, utilizando receptores-alvo purificados [11]. As moléculas identificadas como ativas são nomeadas hits e, de acordo com Hughes et al. [7], são "compostos com atividade desejada identificada através de uma triagem, atividade essa que pode ser confirmada através de novos testes". Em geral, os hits identificados inicialmente são de baixa potência ou afinidade, os quais são selecionados baseados em seus potenciais de otimização em relação à uma série de propriedades farmacodinâmicas (como potência, afinidade, seletividade) e farmacocinéticas (como absorção, biodisponibilidade e metabolismo), gerando compostos-líderes que serão posteriormente desenvolvidos em novas entidades químicas [2, 15, 18]. As bibliotecas de compostos utilizados na abordagem HTS podem variar de composição, contendo desde extratos brutos de plantas até mesmo conjuntos customizados de moléculas sintéticas [2]. Devido ao vasto espaço químico possível para moléculas com potencial farmacológico [19], a abordagem de HTS já foi comparada à busca de "uma agulha num palheiro" [17, 20], mas é preciso reconhecer que ela é responsável pela identificação e desenvolvimento de muitos fármacos disponíveis atualmente [1, 14, 15, 20]. Contudo, o crescente avanço da química medicinal, tanto no campo químico quanto no campo computacional, juntamente com os avanços das metodologias de cristalografia de raio-X e RMN, tem permitido a introdução de conhecimentos estruturais prévios no processo [2], 1.1.2 Abordagem racional O avanço da química combinatória e o aumento de informações estruturais e genômicas de potenciais alvos biológicos para ação farmacológica permitiu o surgimento de uma abordagem menos dependente do acaso no desenvolvimento de novos Capítulo 1. Introdução 22 fármacos [3]. Em especial, avanços na biologia molecular e estrutural, bem como nas ciências da computação, permitiram um grande aumento na determinação de estruturas tridimensionais de vários alvos moleculares com alta resolução [3]. A criação do repositório de estruturas tridimensionais Protein Data Bank (PDB) [21] contribuiu para o maior acesso à informações estruturais de receptores-alvo. Desde sua criação em 1971, o PDB já recebeu quase 150 mil estruturas tridimensionais provenientes de cristalografia de raios-X ou ressonância magnética nuclear (RMN) [22]. Essas informações estruturais permitem uma maior compreensão espacial dos sítios ativos de receptores-alvo e potenciais modos de ligação com moléculas biotivas [3,22,23]. Maiores detalhes sobre o uso de cristalografia de raio-X e ressonância magnética nuclear na caracterização de pequenas moléculas serão abordados na Seção 1.3. Ainda, o crescente poder computacional permitiu o surgimento de algoritmos e modelos capazes de lidar com o crescentes volumes de informação biológica de forma quantitativa, abrindo caminho para o desenvolvimento de fármacos baseado em estrutura (do inglês "structure-based drug design", SBDD) [2, 13, 24], abordagem que tem proporcionado um grande benefício para o campo do desenvolvimento de fármacos [25–27]. Como veremos adiante, informações estruturais sobre o receptor-alvo auxiliam no desenho racional de novos ligantes com uma maior complementariedade com seu sítio de ligação [3, 24], como demonstrado pelos casos de sucesso destacados na Tabela 2 e documentados por Andricopulo et al. [3]. Tabela 2 – Fármacos desenvolvidos utilizando abordagem SBDD, documentados por Andricopulo et al. [3]. Fármaco Imitinab Raltitrexed Dorzolamide Captopril Oseltamivir Zanamivir Amprenavir Indinavir Lopinavir Nelfinavir Ritonavir Saquinavir Medicamento Gleevec R Tomudex R Trusopt R Capoten R Tamiflu R Relenza R Agenerase R Crixivan R Kaletra R Viracept R Norvir R Invirase R Empresa AstraZeneca AstraZeneca Merck & Co. BMS Roche GSK GSK Merck & Co. Abbott Pfizer Abbott Roche Alvo molecular BCR-Abl tirosina cinase Timidilato sintase Anidrase carbônica Enzima conversora de ngiotensina I Neuraminidase Neuraminidase Protease do HIV Protease do HIV Protease do HIV Protease do HIV Protease do HIV Protease do HIV Indicação Câncer Câncer Glaucoma Hipertensão Influenza Influenza AIDS AIDS AIDS AIDS AIDS AIDS O avanço do poder computacional proporcionou a geração de grandes bibliotecas virtuais de ligantes, muito maiores do que as bibliotecas sintéticas utilizadas na abordagem HTS e, no intuito de processar esse grande volume de compostos, métodos computacionais foram desenvolvidos para avaliar automaticamente esses bancos de dados, reduzir seu tamanho e priorizar compostos para avaliação biológica [14]. Esses Capítulo 1. Introdução 23 métodos computacionais são conhecidos como ferramentas de varredura virtual (do inglês virtual screening, VS). Figura 2 – Esquema do processo de desenvolvimento de fármacos baseado em estrutura (reproduzido de Andricopulo et al. [3]). Métodos de VS podem ser divididos em métodos baseados em estrutura (do inglês structure-based virtual screening, SBVS) ou baseados em ligante (do inglês ligandbased virtual screening, LBVS) [3,14]. Enquanto os métodos SBVS são aplicados quando existem informações estruturais sobre o receptor-alvo disponíveis, os métodos LBVS utilizam apenas informações derivadas de ligantes com ação biológica conhecida. [3, 14] Dentre as técnicas utilizadas na abordagem SBVS, podemos destacar a ancoragem molecular (do inglês molecular docking), que utiliza informações tridimensionais do receptor-alvo e do ligante, predizendo múltiplos modos de ligação entre as duas moléculas e os ordenando de acordo com funções internas de pontuação [27–31]. Os softwares dedicados à ancoragens do tipo proteína-ligante, como GOLD [32], Glide [33, 34], FlexX [35], AutoDock [36] e Vina [37], têm recebido grande atenção nos últimos anos devido a sua versatilidade e aplicabilidade. Os tipos mais comuns de ancoramento proteína-ligante são o rígido, no qual ligante e receptor são considerados corpos rígidos durante a busca por modos de ligação; o flexível, no qual ligações simples do ligante e a cadeia lateral de aminoácidos específicos são passíveis de torção; e o parcialmente flexível, no qual parte do sistema permanece rígido enquanto outra parte possui liberdade torcional para buscar modos de ligação [30, 38]. Como alguns dos desafios intrínsecos da ancoragem molecular, podemos citar a capacidade de lidar com a flexibilidade Capítulo 1. Introdução 24 do receptor e do ligante com acurácia, a qualidade das informações estruturais do receptor utilizado, a seleção da biblioteca de ligantes utilizada e a acurácia da função de pontuação [3, 29–31, 38]. Dentro da abordagem LBVS, podemos destacar os estudos quantitativos de relações estrutura-atividade (do inglês quantitative structure-activity relationship, QSAR), os quais correlacionam as estruturas de ligantes através de descritores matemáticos e suas respectivas atividades em um receptor-alvo [39–41]. Os primeiros estudos utilizando propriedades físico-químicas de pequenas moléculas e fragmentos como descritores [42] forneceram bases teóricas importantes para as décadas seguintes, [43–47]. Como exemplos de descritores comumente utilizados nas correlações entre estrutura e atividade, podemos citar o log p, que representa uma medida de hidrofobicidade; número de ligações químicas e conectividade [39, 44, 48]. Ainda, algumas propriedades moleculares são comumente utilizadas para criar correlações entre fragmentos moleculares e substituições químicas, tais como a refratividade molar (MR), relacionada ao índice de refração e peso molecular; a constante hidrofóbica π, relacionada com a diferença entre o log do coeficiente de partição de uma molécula não-substituída e uma molécula substituída; e a constante de Hammet σ, uma medida de susceptibilidade de doação ou sequestro de elétrons de um determinado substituinte [14, 39, 44, 48]. Os descritores abordados até aqui são categorizados como uni ou bidimensionais e não levam em consideração informações conformacionais dos ligantes [3, 14]. Ainda, sabe-se que tais descritores ignoram informações sobre a estereoquímica dos ligantes analisados, podendo gerar relações espúrias [14]. Dessa forma, descritores 3D também podem ser utilizados no estudo de QSAR, como o caso dos softwares CoMFA (do inglês comparative molecular field analysis) [49, 50] e CoMSIA (do inglês comparative similarity index analysis) [51, 52]. Esses métodos utilizam informações tridimensionais das conformações de múltiplos ligantes, permitindo o cálculo de mapas de contorno de propriedades como hidrofobicidade, choques estéricos, mapas eletrostáticos, doação e acepção de ligações de hidrogênio, tornando mais fácil a visualização das regiões do ligante no espaço que são estatisticamente relacionados ao aumento ou diminuição de atividade biológica [3, 14]. Contudo, o uso’do método de QSAR-3D é preferível para conformações obtidas experimentalmente tanto por cristalografia de raio-X ou RMN, uma vez que o método busca relacionar informações estruturais com conformações sabidamente ativas dos ligantes [41]. Ainda assim, muitos pesquisadores usam da ancoragem molecular para a geração de conformações potencialmente ativas e as combinam com QSAR-3D, abordagem essa que tem se mostrado útil no desenvolvimento de fármacos [53–55]. Capítulo 1. Introdução 25 1.2 Termodinâmica e espaço conformacional Do ponto de vista termodinâmico, as células podem ser consideradas sistemas isotérmicos, no qual o fluxo de calor não pode ser utilizado como uma fonte de energia para a manutenção da homeostasia celular [56–59]. Assim, a energia livre de Gibbs (G) é utilizada por organismos vivos para promover trabalho e manter o equilíbrio celular dinâmico [56–59]. Fundamentalmente, G representa a quantidade de energia disponível para realizar trabalho à temperatura e pressão constantes e é uma função da composição química e da configuração dos estados iniciais e finais de um processo isotérmico à uma temperatura T. Nesse sentido, G é uma função da entalpia (H ) e da entropia (S ) de um sistema. Em termos moleculares, H representa o conteúdo a quantidade e os tipos de ligações químicas no sistema, enquanto S expressa quantitativamente a desordem de um sistema, ou a quantidade de configurações possíveis [56–59] (Equação 1.1). ∆G = GF inal − Ginicial = ∆H − T ∆S (1.1) Contudo, não é trivial quantificar a quantidade de entalpia ou entropia em sistema complexo e, por isso, um sistema que não está em equilíbrio também pode ser expressado em função da concentração das moléculas do estado inicial e final, como mostrado na equação 1.2. Assim, considerando a constante de massas K e o equilíbrio químico da Equação 1.3 a energia livre se dará por: [RL] R + L RL , sendo K = [R][L] (1.2) ∆G = ∆H − T ∆S = −RT ln K (1.3) Em termos energéticos, a quantidade e os tipos de interações químicas presentes em um sistema, juntamente com o arranjo espacial de suas moléculas, determinam a sua energia livre de Gibbs [56–59]. Assim, diferentes conformações de uma biomolécula estarão em equilíbrio químico entre si de acordo com suas quantidades de G, uma vez que suas conformações produzem diferentes tipos e quantidades de interações intramoleculares e intermoleculares, além de produzirem diferentes arranjos espaciais também no solvente ao seu redor [60–62]. Dessa forma, biomoléculas como proteínas, carboidratos e ligantes possuem diferentes estados conformacionais em solução e em equilíbrio químico de acordo com sua energia livre de Gibbs, o que pode ser representado Capítulo 1. Introdução 26 como uma superfície de energia livre de uma molécula em função de suas diferentes conformações (Figura 3). Um exemplo dessa relação energia-conformação é o enovelamento de proteínas. Devido à rugosidade de sua superfície energética, uma proteína demoraria uma absurda quantidade de tempo para encontrar sua estrutura nativa usando apenas combinações sistemáticas para cada possibilidade conformacional [63]. No entanto, o enovelamento de proteínas é observado experimentalmente na faixa dos milissegundos à segundos (podendo chegar a horas) [64, 65], como observado por Levinthal [63] em 1969, dando origem ao paradoxo que levou seu nome. O próprio autor propôs que proteínas possuem uma rota bem definida até sua estrutura nativa de uma forma cinética e energeticamente controlada. Na década de 1980, o conceito termodinâmico do enovelamento de proteínas foi expandido, levando em consideração, por exemplo, as contribuições entálpicas de formação de ligações de hidrogênio e pontes salinas entre resíduos, da ação das forças de Van der Walls e efeitos hidrofóbicos, além das contribuições entrópicas do solvente ao redor da molécula ou de sua própria conformação [66–69]. Na década de 1990, Onuchic et al. [70] propôs a hipótese do funil de energia no enovelamento de proteínas, no qual a energia livre diminui à medida que o número de configurações possíveis se afunila [71]. Sob uma perspectiva atômica, a temperatura de um sistema é relacionada à energia cinética dos seus átomos [56,57]. Uma consequência direta disso é um permanente nível de movimentação de pequenas ou grandes moléculas contidas em qualquer sistema em que T = 0, tanto em níveis translacionais, como rotacionais e vibracionais [56, 57]. Esse grau de movimentação atômica no espaço permite que uma molécula visite diferentes estados conformacionais ao longo da sua superfície energética e, na prática, percorre um certo "espaço conformacional", transitando entre diferentes mínimos energéticos, produzindo diferentes populações conformacionais que coexistem e permanecem em equilíbrio termodinâmico de acordo com suas quantidades de G [73–75]. Essas transições conformacionais ocorrem mediante a transposição de barreiras energéticas entre mínimos energéticos e, como discutido por Göbl et al. [76], quanto maior a barreira energética envolvida na transição conformacional, maior é o tempo necessário para essa transição acontecer isotermicamente, como observado por experimentos de RMN [77]. Nesse sentido, pequenas flutuações conformacionais no mesmo mínimo energético podem levar em torno de 10-12 a 10-9 segundos para ocorrerem em meio biológico, enquanto transições para outros mínimos energéticos podem levar 103 a 109 vezes mais tempo para ocorrer (Figura 4). Capítulo 1. Introdução 27 Figura 3 – Espaço conformacional de uma biomolécula em função das respectivas energiaslivres de suas conformações. À medida em que a energia-livre diminui, a conformação se aproxima da nativa. (adaptado de Müller, Wu e Palczewski [72]). 1.2.1 Termodinâmica de formação de complexos receptor-ligante Considere, por exemplo, a formação do complexo RL à partir do ligante L e do receptor R: [RL] R + L RL , sendo K = [R][L] (1.4) Experimentalmente, a quantificação de propriedades termodinâmicas do complexo RL pode ser realizada utilizando técnicas de calorimetria de alta sensibili- Capítulo 1. Introdução 28 Figura 4 – Superfície de energia-livre em função das conformações de uma biomolécula e a escala temporal necessária para que transições ocorram entre as populações conformacionais (retirado de Göbl et al. [76]). dade [78, 79]. Uma dessas técnicas é a calorimetria de varredura diferencial (do inglês differential scanning calorimetry, DSC), que foi desenvolvida na década de 1960 e permite o cálculo da capacidade calorífica das interações receptor-ligante no complexo RL através de incrementos de calor em uma determinada faixa de temperatura [80–82]. Também na década de 1960, a calorimetria de titulação isotérmica (do inglês isothermal titration calorimetry, ITC) foi descrita como método de quantificação da constante de equilíbrio K e ∆H de equilíbrios ácido-base e reação de complexação de íons metálicos [83, 84]. Em 1979, Biltonen e Langerman [85] descreveram o uso desses microcalorímetros aplicados à química biológica, incluindo uma discussão dos instrumentos disponíveis na época, aplicações, desenho experimental e análise de dados, o que permitiu a popularização da técnica. Apenas 10 anos depois, o primeiro calorímetro especificamente desenhado para o estudo de sistemas biológicos se tornou comercialmente disponível pela empresa MicroCal [86], o que permitiu uma maior disseminação da técnica desde então [79, 87]. Assim, o ITC permite o cálculo de ∆H em função da formação do complexo RL, bem como sua taxa de formação, o que permite a derivação de ∆G e ∆S envolvidos no processo [78, 79]. Essa decomposição energética tem sido utilizada para avaliar a dinâmica de biomoléculas, como por exemplo, a relação da entropia com a flexibilidade Capítulo 1. Introdução 29 de anticorpos [88]. Ainda, os dados derivados de ∆H tem sido propostos como guia para melhorias iterativas visando o aumento da componente entálpica no processo de formação de RL [89]. Altualmente, os microcalorímetros possuem uma alta sensibilidade na faixa de 0.1 µcal, o que permite a determinação de constantes de ligação na faixa dos 10-8 e 10-9 M-1 [90]. A técnica tem sido utilizada para quantificar a afinidade entre pequenas moléculas e seus receptores [91–94], avaliar processos de competição pelo sítio ativo [95], cooperação e eventos de ligação relacionados ao estado de protonação [96, 97] ou tautomérico dos componentes [98]. Para alguns casos específicos, o ITC também pode ser ser utilizado para avaliar a cinética de formação do complexo RL [99–102]. No entanto, somente o uso de técnicas de ITC ou DSC para o estudo da formação de complexos RL não fornece informações estruturais explícitas dos eventos químicos que ocorrem a nível molecular ao longo do processo [78, 79]. Tradicionalmente, inferências têm sido feitas nesse sentido combinando mutações sítio-dirigidas com técnicas de medição de taxas cinéticas (como ITC, por exemplo) [103–105], permitindo derivar indiretamente o impacto de mudanças estruturais nas taxas de ligação entre ligante e receptor. Informações estruturais explícitas têm sido obtidas através da combinação de experimentos e simulações computacionais, tais como demonstrado por Schmidtke et al. [106], descrevendo o papel que moléculas de água circundantes podem exercer na taxa de formação do complexo RL. Em termos estruturais, a dinâmica do solvente ao redor de R e L desempenha um papel fundamental para a termodinâmica de ligação [107–110] (Figura 5). A variação de entalpia no processo de ligação é resultado da formação e rompimento de um grande número de interações, incluindo a perda de ligações de hidrogênio e interações de van der Waals formadas entre R e L com o solvente. Analogamente, a variação de entropia está relacionada ao rearranjo do solvente no sistema, à mudança conformacional do receptor e ligante após a ligação e à sua perda de graus de liberdade rotacionais e translacionais [105]. Nesse sentido, é possível considerar ∆H e ∆S como os fatores que dirigem o processo de formação do complexo RL. De fato, a decomposição da Equação 1.1 nesses termos nos leva às equações 1.5 e 1.6. ∆G = (HRL − T SRL) − [(HR − T SR) + (HL − T SL)] (1.5) ∆G = HRL − (HR + HL) −T [SRL − (SR + SL)] Custo Entálpico Custo Entrópico (1.6) Capítulo 1. Introdução 30 Figura 5 – Eventos químicos envolvidos na formação do complexo receptor-ligante e no seu custo entálpico e entrópico. À esquerda, a remoção da camada de solvatação no sítio ativo necessária para a complexação. À direita, a mudança configuracional das moléculas de água proximais ou distais ao soluto (reproduzido de Kastritis e Bonvin [111]). As contribuições de ∆H e ∆S para a formação de RL estão intimamente relacionadas. Por exemplo, uma ligação com alta complementariedade resultante da formação de múltiplas interações favoráveis entre R e L possuirá uma grande contribuição entálpica favorável (negativa), mas é comumente acompanhada por uma restrição conformacional e de movimento de ambos, resultando em uma contribuição entrópica desfavorável (negativa) e uma energia livre final de média magnitude [105, 112]. Analogamente, um grande ganho entrópico é comumente acompanhado por uma penalidade entálpica devido à energia necessária para romper interações não-covalentes entre receptor e ligante no complexo RL [105, 112]. Com a popularização das técnicas de ITC no estudo de sistemas biológicos nas últimas décadas, a compensação entre entalpia e entropia tem sido utilizada como alvo de otimização no desenvolvimento de fármacos [113–116]. A estratégia se dá pela maximização das contribuições favoráveis de entalpia e entropia e na mitigação das penalidades envolvidas, em busca de uma formação de RL mais exergônica [117]. 1.2.2 Termodinâmica no desenvolvimento de fármacos Tradicionalmente, a indústria farmacêutica tem utilizado uma série de técnicas computacionais nas etapas iniciais de desenvolvimento de fármacos, como estudos Capítulo 1. Introdução 31 de QSAR e ancoragem molecular, visando a otimização de ligantes cada vez mais eficazes [113–116]. Contudo, o processo de formação de um complexo receptor-ligante envolve uma série de eventos químicos ao redor do ligante e do receptor que são de difícil previsão, como por exemplo a acessibilidade do solvente à alguns heteroátomos, a meia-vida de ligações de hidrogênio e a estruturação proximal do solvente [118, 119]. A popularização do ITC e a redução dos custos para obtenção de receptoresalvo e ligantes permitiram um aprofundamento da química medicinal nos detalhes da termodinâmica do processo de ligação, visando a geração de novas informações que permitam o desenvolvimento de fármacos mais potentes [118, 120, 121]. Em uma metanálise usando informações obtidas a partir de banco de dados como Scorpio, BindingDB [122] e dados das empresas Astex e AstraZeneca, Williams et al. [123] observaram que pequenos ligantes se ligam devido à um ganho entálpico, enquanto ligantes maiores - e consequentemente com maior grau de liberdade - podem também contar com um ganho entrópico. Ainda, os dados demonstram que a esmagadora maioria de compostos investigados pelas empresas Astex e AstraZeneca se ligam de forma entalpicamente dirigidas [123]. Ademais, estudos anteriores mostraram que ligantes que se ligam à um receptor de forma entropicamente dirigida tendem a ser pouco seletivos e a formar ligações espúrias [124], o que também sugere que um fino balanço entre ganhos entrópicos e entálpicos é o objetivo principal das abordagens baseadas em dados termodinâmicos, principalmente no que tange a capacidade preditiva de métodos computacionais. Contudo, uma considerável parte dos fármacos aprovados no mundo possuem mais graus de liberdade do que pequenos fragmentos moleculares. Dessa forma, à medida que o tamanho dos ligantes aumenta, é esperado que as contribuições energéticas provenientes da reorganização proteica e do solvente sejam maiores [123]. Na tentativa de explorar essa contribuição entrópica, a otimização da afinidade de ligantes baseado na entropia tem sido tradicionalmente ligada ao aumento na lipofilicidade [125], devido à interiorização de sua superfície hidrofóbica após a complexação, e também na massa molecular do ligante, fenômeno conhecido por "obesidade molecular" [126]. 1.3 Caracterização estrutural de pequenas moléculas 1.3.1 Cristalografia e Difração de Raio-X A cristalografia é baseada na precipitação organizada e lenta de moléculas sob condições controladas e que formam arranjos periódicos, ou seja, cristais [127, 128]. A formação desses agregados organizados é dependente de combinações dos múltiplos Capítulo 1. Introdução 32 parâmetros de efeitos diretos sobre a formação dos cristais, como temperatura, pH ou solução tampão [129], o que torna um desafio a concepção de um único protocolo de cristalografia que possa ser extrapolado para diferentes biomoléculas. A qualidade do cristal está ligada diretamente à sua capacidade de difração de raios-X de forma padronizada e sensível à detecção [129]. Isso se deve à física intrínseca do processo: quando o feixe de raios-X emitido atinge as camadas eletrônicas de um átomo, ondas secundárias e radiais são geradas num processo chamado de espalhamento, as quais são detectadas para posterior transformação em mapas de densidade eletrônica da molécula em estudo [129,130]. Por isso, a repetição da organização molecular e os padrões de arranjo periódico em um cristal reforçam os sinais provenientes do espalhamento, aumentando a resolução e intensidade dos dados coletados [131]. Em posse do mapa de densidade eletrônica, técnicas computacionais são utilizados para reconstrução do modelo estrutural que melhor se ajusta à densidade eletrônica obtida [129]. Nesse sentido, a obtenção de mapas de densidade eletrônica com alta resolução na difração de raio-X requer que as moléculas formadoras do cristal possuam uma alta semelhança conformacional, pois, de outra forma, os sinais seriam muito diversos, resultando numa coleta de dados pouco consistente e de baixa resolução [130, 131]. Por esse motivo, o uso dessa técnica para determinar a estrutura de moléculas que possuem alto grau de flexibilidade acaba sendo limitada [131]. Por isso, a caracterização estrutural de ligantes utilizando técnicas de cristalografia é comumente realizada sob algum grau de restrição conformacional, como por exemplo, com o ligante acoplado à seu receptor-alvo [132]. As técnicas de impregnação de ligantes em cristais de proteínas, iniciadas na década de 1950 e 1960, têm se tornado comumente utilizadas para o estudo de interações receptor-ligante e de mecanismos enzimáticos [133]. No entanto, o sucesso dessas técnicas está ligado à existência de canais de solvente permeando a grade cristalográfica que permitam a difusão das moléculas de ligante até seus receptores [133, 134]. Outros desafios têm sido apontados, como a necessidade do sítio de ligação não estar impedido por efeitos de empacotamento cristalino, interferências do solvente ou pH das soluções utilizadas [134]. Ainda, revisões sistemáticas realizadas no PDB apontaram para a existência de erros relacionados à geometria de ligantes nas estruturas depositadas ao longo dos anos, como perda da planaridade de anéis aromáticos ou do perfil tetraédrico de carbonos sp3 [135, 136]. Parte desses erros se deu pela inexistência de algoritmos específicos para interpretar corretamente a densidade eletrônica de ligantes [135, 136]. Para mitigação desses erros, esforços recentes têm sido realizados para a criação de softwares específicos para a interpretação correta das geometrias de ligantes em dados cristalográficos, como Capítulo 1. Introdução 33 é o caso do AceDRG [137]. A obtenção de estruturas tridimensionais acuradas de complexos receptor-ligante pode fornecer variadas informações sobre as interações realizadas, desde o modo de ligação até potenciais espaços à serem explorados no sítio ativo [132, 134], o que vem a ser de particular interesse no campo do desenvolvimento de novos fármacos [3]. No entanto, devido à restrição conformacional imposta pela fase sólida, estruturas cristalográficas fornecem poucas informações sobre a dinâmica conformacional de ligantes ou do complexo receptor-ligante [129]. 1.3.2 Ressonância Magnética Nuclear A ressonância magnética nuclear (RMN) é uma técnica capaz de obter informações estruturais e dinâmicas de moléculas em solução, o que representa uma vantagem sobre a cristalografia em estudos biológicos. Por isso, o RMN permite o estudo de eventos moleculares que ocorrem ao longo do tempo, como movimentos intramoleculares, reações químicas ou até mesmo enovelamento de proteínas [138] (Figura 6). A técnica se baseia no efeito de campos magnéticos no momento angular de spins atômicos. A manipulação de pulsos magnéticos permite mensurar a energia envolvida na transição dos estados energéticos dos spins. Na área de desenvolvimento de fármacos, os núcleos atômicos mais comumente estudados são o 1H, 13C, 15N 19F e 31P, por serem os mais comumente encontrados nas biomoléculas. Em geral, os espectrômetros de RMN são classificados de acordo com a frequência emitida do 1H sob o pulso magnético (frequência de Larmor). A manipulação de uma série de pulsos magnéticos torna possível o estudo da vizinhança química dos átomos em análise, tornando possível a coleta de informações que posteriormente são compiladas para obtenção da estrutura tridimensional de moléculas [139]. Para a determinação estrutural, o efeito Overhauser nuclear (NOE, do inglês Nuclear Overhauser effect) é frequentemente utilizado em espectros de correlação bidimensionais [140] e se trata da influência da magnetização de átomos próximos (geralmente até 5 Å), porém não-ligados covalentemente [139]. Assim, modelos estruturais tridimensionais são calculados de forma que obedeçam as restrições impostas pelos dados coletados nos ensaios de RMN. O estudo de interações proteína-ligante através de técnicas de RMN pode ser dividido nas abordagens baseadas no receptor e no ligante [141]. Nos métodos baseados no receptor, um espectro é obtido para a proteína enquanto o ligante é titulado, permitindo a identificação dos resíduos de aminoácidos envolvidos na interação com o Capítulo 1. Introdução 34 ligante através dos seus respectivos deslocamentos químicos. Já nos métodos baseados no ligante, um espectro do ligante é obtido e a proteína é adicionada, permitindo mensurar a proporção entre ligantes livres e complexados, da qual é possível derivar parâmetros de afinidade receptor-ligante [141, 142]. Figura 6 – Escalas de tempo e espaço para vários fenômenos celulares e biomoleculares e os respectivos métodos analíticos mais adequados para análise de cada evento (modificado de Dror et al. [138]). Ainda que experimentos de RMN sejam realizados em solução e permitam considerar o papel da dinâmica conformacional de ligante e receptor, é preciso considerar que a formação do complexo em meio biológico acontece em solvente aquoso e, portanto, o uso de solventes orgânicos comumente utilizados nesses experimentos pode impactar na descrição conformacional das moléculas envolvidas [143]. Ainda, sabe-se que moléculas de água podem desempenhar um papel importante como mediadores de interações receptor-ligante [144] e, portanto, não devem ser negligenciadas. Capítulo 1. Introdução 35 Com o avanço tecnológico dos sensores e com o aumento da potências dos pulsos magnéticos emitidos, os recentes desenvolvimentos em espectrometria de RMN tem permitido sua aplicação na determinação da estrutura de pequenas moléculas tem permitido grandes avanços no campo da química medicinal [145]. Contudo, a escala de tempo de movimentos intermoleculares de pequenas moléculas ocorre na faixa dos nanossegundos, ainda longe da resolução temporal possível na espectometria de RMN atualmente [141, 142, 146]. Para contornar isso, os ensaios de RMN podem ser realizados em baixas temperaturas no intuito de diminuir a energia cinética das moléculas em análise e, consequentemente, a energia potencial disponível para interconversões conformacionais, o que possibilita a identificação de populações conformacionais e suas respectivas abundâncias relativas [147, 148]. 1.4 Campos de Força 1.4.1 A função de potencial e atributos de campos de força Em 1865, o químico alemão August Wilhelm von Hofmann introduziu o modelo de "bastões e esferas"para demonstrar conceitos estruturais em uma palestra na Royal Society [149]. O objetivo de compreender maiores detalhes da estrutura de certas moléculas levou à incorporação métodos quantitativos nas tentativas de predição de estrutura, os quais começaram a ser desenvolvidos entre as décadas de 1930 e 1960 [150–154]. Os resultados de décadas de trabalhos levou ao desenvolvimento de expressões analíticas da superfície energética de uma molécula em função das suas coordenadas atômicas, mais comumente conhecido como campos de força [149]. Ao longo dos anos, e principalmente à partir da década de 1980, diferentes grupos de pesquisa ao redor do mundo despenderam grandes esforços no desenvolvimento de funções matemáticas de energia potencial capazes de modelar adequadamente dados experimentalmente observáveis [155–157]. Dentre esses grupos, podemos citar os desenvolvedores dos campos de força AMBER [156, 158], CHARMM [159, 160], OPLSAA [157, 161, 162] e GROMOS [163–166], por exemplo, os quais possuem uma função matemática que descreve a energia potencial ϑtotal basicamente idêntica: ϑtotal = ϑligação + ϑângulação + ϑpróprios + ϑimpróprios + ϑLennard−Jones + ϑeletrostático (1.7) Capítulo 1. Introdução 36 ϑtotal(rN ) = 1 2 kb(b − b0)2 + 1 2 kθ(θ − θ0)2 + 1 2 kϕ(1 + cos(nϕ + γ)) 1 + 2 kξ(ξ − ξ0)2 + N i N j=i   4εij  σij rij 12  −  σij  rij 6   + N i N qiqj j=i 4π 0Rij (1.8) Dessa forma, um campo de força é um conjunto de parâmetros calibrados para reproduzirem certas propriedades-alvo quando aplicados à equação 1.8. A capacidade de um campo de força em descrever adequadamente um conjunto de moléculas está diretamente ligada à acurácia e diversidade de seus parâmetros topológicos [167–169]. Contudo, a geração de novos parâmetros frequentemente envolve novos cálculos quânticos e um custo computacional extra e, muitas vezes, elevado. Dessa forma, atribuise aos campos de força o atributo da modularidade, que consiste na parametrização de módulos e no uso desses módulos para descrição de moléculas mais complexas [149, 167]. Um exemplo comum é a parametrização de resíduos de aminoácidos e no seu uso para descrição de proteínas inteiras. Ainda, assume-se que módulos bem calibrados exercem entre si um efeito aditivo que permite a descrição de moléculas mais complexas [149,167]. 1.4.2 Campos de força de pequenos ligantes A maioria dos campos de força atuais foram inicialmente desenvolvidos objetivando a descrição sistemática de biomoléculas [169]. Parte desse motivo se dá pelo menor grau de diversidade química presente nos monômeros que constituem as biomoléculas e, consequentemente, da menor quantidade de parâmetros necessários [149]. No que tange a descrição sistemática de pequenos ligantes, o grau de diversidade química possível é muitas vezes maior, tornando mais difícil a tarefa de calibrar um único conjunto de parâmetros que descreva as moléculas de interesse com acurácia . É necessário, contudo, reconhecer alguns esforços nessa tarefa. Em 1992, Rappe et al. [170] desenvolveram um conjunto de parâmetros para todos os átomos da tabela periódica (motivo pelo qual foi denominado "Universal force field") e utilizando a mesma função de potencial descrita na equação 1.8. Os autores, calibraram os parâmetros utilizando dados cristalográficos de pequenos ligantes como propriedades-alvo [170]. Um outro exemplo é o campo de força MMFF94 da empresa Merck, desenvolvido por Halgren [171]. Um grande diferencial do MMFF94 é sua função de potencial particular, com o uso de potenciais de ligação e angulação não-harmônicos e potenciais de LennardJones do tipo 14-7, em oposição ao comumente usado potencial 12-6 [149, 171]. O autor fez uso de propriedades estruturais e energéticas calculadas através de métodos quânticos, obtendo uma boa correlação com seus dados calculados pela mecânica molecular [171]. Capítulo 1. Introdução 37 Contudo, a falta de dados experimentais em fase condensada pode ser considerado uma potencial fraqueza tanto no processo de parametrização de UFF e MMFF94 quanto em seu conjunto de testes [149]. Não surpreendentemente, energias de sublimação calculadas com MMFF94 foram sistematicamente subestimadas em 30-40% [149]. Mais recentemente, na década de 2000, os campos de força comumente usados para descrição de biomoléculas tiveram suas estratégias de parametrização naturalmente extrapoladas para pequenos ligantes, no intuito de descrever complexos receptor-ligante utilizando parâmetros compatíveis entre si. O campo de força GAFF (do inglês Generalized Amber Force Field) [172] adotou a mesma estratégia de parametrização do campo de força AMBER, utilizando grande parte dos parâmetros de Lennard-Jones já existentes no campo de força canônico, juntamente com os potenciais torcionais [172]. Cargas atômicas parciais são derivadas à partir da superfície de potencial eletrostático obtida através de cálculos quânticos [172]. Outra iniciativa foi o campo de força CGenFF [160], compatível com o campo de força CHARMM. Seguindo também o uso de parâmetros do campo de força canônico, CGenFF faz uso dos potenciais de alguns Lennard-Jones já existentes, enquanto deriva as demais propriedades topológicas de cálculos quânticos [160]. Em especial, as cargas atômicas parciais são derivadas de cálculos quânticos utilizando o momento de dipolo como guia, além de otimizá-las através da avaliação das energias de interação do ligante com moléculas de água [160]. Ainda, é importante destacar que a família do campo de força OPLS possui o conjunto de parâmetros OPLS3 especificamente desenhado para pequenos ligantes [162, 173], mas está sob licenciamento do software Schrödinger. Por fim, a família do campo de força GROMOS ainda não possui um conjunto de parâmetros oficial para o tratamento de pequenos ligantes, tal qual as demais famílias. Contudo, o servidor ATB (do inglês Automated Topology Builder) [174] tem sido utilizado para obtenção automatizada de parâmetros para pequenos ligantes de forma compatível com a filosofia GROMOS [175]. Os parâmetros de Lennard-Jones, ligação, angulação e torcionais utilizados pelo ATB são provenientes dos campos de força canônicos, enquanto as cargas atômicas parciais são obtidas através de cálculos quânticos e abordagens baseadas em bancos de dados [175]. Mais recentemente, o grupo de desenvolvimento do GROMOS demonstrou o uso de métodos de aprendizado de máquina para a derivação de cargas atômicas parciais [176], porém sem aplicá-los diretamente ao campo de força GROMOS. Capítulo 1. Introdução 38 1.4.2.1 Mecânica quântica na parametrização de campos de força Com o popularização dos cálculos mecânica quântica (MQ) no início da década de 1970, pesquisadores fizeram uso desses métodos para a geração de parâmetros topológicos de pequenas moléculas orgânicas [177–179], em geral, para a derivação de cargas atômicas parciais [180, 181] e potenciais torcionais [179]. Na década de 1990, o aumento do poder computacional e a aplicação de algoritmos de MQ como DFT (do inglês density functional theory) [182, 183] a maiores compostos orgânicos proporcionaram um maior uso da MQ no desenvolvimento de campos de força. Os primeiros trabalhos incluíram cálculos quânticos para calcular mapas de φ e ψ de proteínas e refinar essas torções para parâmetros AMBER [156], para calcular energias conformacionais de peptídeos e otimizar parâmetros do CHARMM [159] e otimizar as constantes dos potenciais torcionais dos parâmetros de OPLS-AA [161]. Atualmente, a MQ está intimamente associada à pelo menos uma parte do processo de parametrização de novos compostos [158, 168, 184–186]. Para os campos de força das famílias AMBER, CHARMM e OPLS, por exemplo, a derivação de cargas atômicas parciais é sistematicamente realizada através da superfície de potencial eletrostático calculada por métodos quânticos, geralmente do tipo HF/6-31G* [187, 188] ou MP2/6-31G* [188, 189]. Outro exemplo é a parametrização de novos potenciais torcionais, que usualmente se dá pelo cálculo da barreira energética torcional através de métodos quânticos e pela posterior derivação de parâmetros de função cosseno para descrevê-la no âmbito da mecânica molecular [158, 165, 190, 191]. 1.4.2.2 Parametrização na Filosofia GROMOS Originalmente, o campo de força GROMOS foi desenvolvido junto com um software de simulação de mesmo nome, com a versão do campo de força GROMOS37C4 [192]. Contudo, seu conjunto de parâmetros alcançou outros softwares e foi largamente expandido ao longo dos anos, com o GROMOS43a1 e GROMOS43a2 [163], GROMOS45a4 [164], GROMOS53a6 e GROMOS53a6 [165], GROMOS54a7 e GROMOS54a8 [166], além de reparametrizações de alguns termos já inclusos em versões mais antigas da série [193]. Todas essas versões tiveram como objetivo parametrizar blocos de montagem de proteínas, lipídeos e ácidos nucléicos. Mais recentemente, alguns esforços foram feitos para a correta descrição topológica de carboidratos [186, 194, 195] e de anéis aromáticos [185] dentro da filosofia GROMOS. O campo de força GROMOS utiliza a filosofia de pseudoátomos para grupamentos CH1, CH2 e CH3 alifáticos. A justificativa para isso é que esses hidrogênios realizam poucas ou quase nenhuma interação eletrostática devido ao seu caráter apolar [163]. Capítulo 1. Introdução 39 Por isso, os parâmetros de Lennard-Jones desses grupamentos são calibrados para descreverem as interações hidrofóbicas que eles realizam com a vizinhança [163, 169]. O processo de parametrização da filosofia GROMOS é baseado nas propriedades físico-químicas de líquidos orgânicos, iniciado em sua versão 43a1. Em suma, os termos topológicos de uma molécula são empiricamente definidos e usados para simular as propriedades físico-químicas de um líquido orgânico, as quais são comparadas com as respectivas propriedades experimentais buscando uma pequena diferença [165, 166]. Em teoria, o objetivo dessa abordagem é aproximar o comportamento das moléculas simuladas em fase condensada com o comportamento observado experimentalmente [163, 165]. Historicamente, o parâmetros GROMOS foram calibrados usando como alvo propriedades como densidade, entalpia de vaporização e energia-livre de solvatação, além de terem sido testados na descrição de distâncias interprótons obtidas por RMN [196]. 1.5 Amostragem conformacional de ligantes O espaço conformacional - considerando distâncias de ligação, angulação e torções - para moléculas pequenas pode ser enorme. Mesmo em estado cristalino (ou seja, sólido), pequenas moléculas podem adotar uma variedade de estados conformacionais, como observado nas estruturas depositadas nos servidores Protein Data Bank - PDB [21] e Cambridge Structural Database - CSD [197]. Por exemplo, uma molécula com 10 ligações rotacionáveis pode possuir mais de 59 mil conformações, o que torna a busca por conformações com relevância biológica um desafio metodológico [198]. De acordo com Hawkins [198], um conformero pode ser definido como uma conformação molecular distinta relacionada à um mínimo na superfície de potencial energético de uma determinada molécula em um determinado contexto químico. Nesse sentido, esforços têm sido empregados no desenvolvimento de metodologias computacionais para amostrar o espaço conformacional de ligantes. Técnicas computacionais são comumente utilizadas para o estudo conformacional de ligantes e na predição de modos de ligação com seus respectivos receptores-alvo [23,199]. Mais especificamente, simulações de Monte Carlo (MC) têm sido utilizadas para o estudo do espaço conformacional de interação entre duas moléculas [23, 199, 200]. O método consiste na busca estocástica de conformações possíveis para o sistema utilizando uma função de potencial e um campo de força [23,199–201], o que possibilita a associação probabilística entre diferentes configurações e conformações do sistema e suas respectivas energias potenciais. Em teoria, e assumindo a ergodicidade do sistema, as simulações de MC permitem a amostragem conformacional da superfície energética [23]. Ainda, Capítulo 1. Introdução 40 outro exemplo de aplicação de simulações de MC é no desenvolvimento de parâmetros topológicos em fase condensada tal como foi realizado para o campo de força OPLS no fim da década de 1980 [157]. Posteriormente, esses parâmetros foram incorporados por outros campos de força, tais como AMBER [156], ou serviram de inspiração para o desenvolvimento de novos parâmetros [159, 163]. Enquanto alguns métodos de amostragem optam por explorar todo o espaço conformacional de baixa energia possível de uma molécula utilizando métodos estocásticos - e resultando num maior custo computacional-, os métodos sistemáticos limitam essa busca trocando uma possível pequena perda de acurácia em busca de uma possível grande melhoria na relação custo-benefício [198]. Um exemplo da limitação de busca nos métodos sistemáticos é conhecida como rotor rígido (RR), na qual as distância de ligação e angulações são mantidas fixas, enquanto somente as torções são variadas em busca de diferentes confórmeros [198, 202]. Frequentemente, os métodos sistemáticos fazem uso de algoritmos genéticos na busca de confórmeros [203–206]. Contudo, é preciso destacar que a presença de solvente é, em geral, ignorada nos métodos sistemáticos, o que pode impactar na descrição das conformações de menor energia de uma determinada molécula [202]. Por outro lado, métodos estocásticos podem levar em consideração a presença explícita do solvente, ainda que impactando o custo computacional [202]. Algumas ferramentas usadas para amostragem conformacional de ligantes são mostrados na Tabela 3. Tabela 3 – Exemplos de ferramentas usadas para amostragem conformacional de ligantes. Ferramenta Balloon_GA ETKDG Frog2 MC-Dock OMEGA RDKit Tipo Estocástico Estocástico Estocástico Sistemático Sistemático Estocástico Algoritmo algoritmo genético geometria + base de dados Monte Carlo Força bruta; ancoragem e expansão base de dados; enumeração completa geometria Referência Vainio e Johnson [203] Riniker e Landrum [207] Miteva, Guyon e Tufféry [208] Sauton et al. [209] Hawkins et al. [210] Riniker e Landrum [207] 1.5.1 Dinâmica molecular na caracterização de ligantes A dinâmica molecular (DM) é um método de simulação computacional que se utiliza da mecânica clássica (newtoniana) para descrever o movimento de certas partículas em um sistema de análise [167]. Nessa descrição, os átomos são descritos como corpos perfeitamente esféricos e o modelo matemático para descrever a dinâmica dos átomos em moléculas se dá pela 2a Lei de Newton: dϑ dri = −mi d2ri dt2 (1.9) Capítulo 1. Introdução 41 na qual o átomo i de massa mi se desloca dri em um tempo dt. Dada as velocidades e posições iniciais, a integração dessa equação em função de dt para todos os átomos do sistema descreve a movimentação dos átomos em um dado sistema ao longo do tempo [167]. Daí, portanto, o nome Dinâmica Molecular. A popularização dos campos de força de biomoléculas nos anos 1990 e 2000, juntamente com o desenvolvimento de parâmetros específicos para pequenos ligantes, permitiu maiores esforços no estudo da dinâmica de complexos receptor-ligante [169]. De um ponto de vista farmacológico, a dinâmica conformacional de um ligante acoplado ao seu receptor-alvo pode levantar informações preciosas sobre seu modo de ligação e, consequentemente, atuar no desenho racional de novos fármacos [119, 211]. De fato, alguns caso de sucesso podem ser mencionados. O uso acoplado de ancoragem molecular e simulações de dinâmica molecular permitiram que Andrea Cavalli et al. [212] identificassem diferentes modos de ligação do ligante propídeo no sítio aniônico periférico da enzima acetilcolinesterase humana, os quais eram compatíveis com mapas de densidade eletrônica obtidos por cristalografia de raio-X. Em outro estudo, Kacker et al. [213] utilizaram uma combinação cálculos de MQ, ancoragem molecular e simulações de DM para inferir o estado de protonação do complexo enzimático BACE-1, permitindo ainda a identificação de moléculas de água com papel estrutural. Numa perspectiva de descrição da dinâmica de pequenas moléculas, também podemos citar o trabalho de Figueira et al. [214], o qual reporta o uso de simulações de DM para inferir modos de ligação de íons acetato com moléculas de [28]hexafirina, corroborando dados de RMN e ensaios colorimétricos. Outros trabalhos de caracterização de compostos naturais também foram realizados, capazes de relacionar com dados biológicos [215,216]. Apesar de proverem informações relevantes sobre possíveis modos de ligação entre receptor e ligante, o uso de simulações não enviesadas de DM, como as descritas acima, não são aconselháveis para descreverem a energia livre de Gibbs relacionada à formação do complexo, propriedade essa que possui grande interesse farmacológico [118, 217]. Isso se dá pois o cálculo de G, juntamente com energias relacionadas à entropia, são diretamente impactadas pela intrínseca amostragem limitada aos métodos clássicos de DM [218]. Em termos probabilísticos, configurações de alta energia são muito menos visitadas ao longo da simulação do que regiões de baixa energia, além de barreiras energéticas com poucas unidades de KBT impedirem a exploração eficiente do espaço conformacional, prejudicando a estimativa de G [218]. Dessa forma, métodos de amostragem ampliada tem sido desenvolvidos com o objetivo de descreverem propriedades termodinâmicas e cinéticas de complexos receptor-ligante. Métodos como FEP (do inglês free energy pertubation) ou TI (do inglês Capítulo 1. Introdução 42 thermodynamical integration) são largamente utilizados para predizerem afinidades de ligação e usados na otimização de compostos-líderes [219–221]. Ainda, o uso de métodos baseados em variáveis coletivas como metadinâmica, dinâmica guiada ou umbrella sampling com o objetivo de calcular a energia livre e parâmetros cinéticos associados à formação do complexo recepto-ligante também tem proporcionado importantes demonstrações do potencial farmacológico de métodos relacionados à dinâmica molecular no desenvolvimento de novos fármacos [222–228]. É verdade, porém, que o alto custo computacional relacionado à essas técnicas dificulta sua utilização e em larga escala, como realizado em outras abordagens em SBDD [218]. Contudo, o aumento do poder computacional e da acurácia de parametrização de campos de força para descrição de pequenos ligantes pode vir, num futuro não tão distante, aumentar o interesse nesses métodos, tornando ainda mais necessário o estabelecimento de protocolos de parametrização sistemática de pequenas moléculas usando propriedades em fase condensada como alvo de calibração. Em particular, o estudo da flexibilidade de moléculas bioativas em solução também pode se beneficiar de parâmetros acurados e pode, em conjunto com metodologias de caraterização conformacional, auxiliar na prospecção de potenciais interações fármaco-receptor. 2 Justificativa 43 Tendo em vista o desafio científico em descrever em larga escala o perfil conformacional de pequenas moléculas em solução através de ensaios químicos e biológicos, o uso de métodos computacionais se tornou uma importante alternativa metodológica de baixo custo e alta agilidade para a fornecer informações com resolução atomística. Ainda, a família de campo de força GROMOS possui uma série de parâmetros muito bem estabelecidos para descrever proteínas, lipídeos e carboidratos. Contudo, até o início deste trabalho, a série GROMOS ainda não havia recebido nenhuma atualização oficial para descrição conformacional e energética de pequenos ligantes. É preciso ressaltar também que os parâmetros da família GROMOS são calibrados para reproduzirem propriedades físico-químicas da fase líquida, como entalpias de vaporização e energia-livre de solvatação, o que se considera uma vantagem na descrição das energias de interação intra e intermoleculares. Ademais, a devida caracterização e quantificação das populações conformacionais de ligantes em solução exige o desenvolvimento de um método analítico com alto grau de detalhamento, o que vêm a ser um desafio tanto criativo como intelectual. Assim, uma melhor compreensão das questões mencionadas permitirá o levantamento de informações sobre a dinâmica de fármacos em solução, suas interações com o solvente, populações conformacionais amostradas e quais grupos funcionais estão acessíveis à interação com o receptor. Esse conhecimento tem potencial de prover bases sólidas para um entendimento mais detalhado da dinâmica de reconhecimento molecular na formação de complexos receptor-ligante. 3 Objetivos 44 O objetivo desse trabalho é o estudo da flexibilidade de pequenos ligantes livres em solução através de simulações de dinâmica molecular, para prospecção de possíveis interações fármaco-receptor. Serão desenvolvidos uma metodologia de parametrização de ligantes de forma sistemática, bem como métodos de amostragem e caracterização conformacional de ligantes. Por fim, avaliaremos a dinâmica de ligantes em solução, inferindo seu impacto no reconhecimento molecular. Para tal, o trabalho foi dividido em 3 capítulos, cada qual com seus objetivos específicos: I. Desenvolver metodologias para parametrização sistemática de ligantes usando anéis aromáticos como prova de conceito; i. Gerar novos grupos de cargas para anéis aromáticos comumente encontrados em fármacos; ii. Validar os modelos utilizando correlações de propriedades físico-químicas calculas e experimentais; iii. Caracterizar e quantificar as interações dos heteroátomos com o solvente. II. Analisar a amostragem conformacional de moléculas em solução de forma sistemática; i. Parametrizar novos perfis torcionais para diedros de chalconas e flavonóides; ii. Construir a topologia dessas moléculas e validá-las com dados de NOESY; iii. Avaliar o impacto de substituições vicinais nas energias-livres de rotação dos diedros; iv. Caracterizar e quantificar as populações conformacionais simuladas; v. Avaliar a possibilidade de automação da caracterização conformacional de ligantes. III. Avaliar a dinâmica de ligantes em solução e inferir seu impacto no reconhecimento molecular; i. Parametrizar novos perfis torsionais para diedros do ligante PIK75; ii. Caracterizar e quantificar as populações conformacionais simuladas; iii. Correlacionar as populações com a estrutura cristalográfica do PIK75 com- plexado à GSK-3β. 4 Procedimentos metodológicos 45 “... everything that living things do can be understood in terms of the jigglings and wigglings of atoms.” Richard Feynman O estudo da dinâmica de compostos bioativos livres em solução depende da qualidade dos descritores topológicos utilizados em simulações. Nesse sentido, os parâmetros diedrais têm especial destaque por definirem as preferências angulares de ligações torcionáveis sob efeito explícito do solvente ou de interações intramoleculares. As cargas atômicas parciais também exercem influência nas energias de interação dos compostos e o solvente, impactando nas conformações preferenciais em solução. Por isso, a presente tese direcionou esforços para o estabelecimento de um protocolo de construção de topologias baseado na filosofia GROMOS [165,184,193], com o intuito de aumentar a confiabilidade na descrição das populações conformacionais de ligantes amostradas em solução, bem como suas energias de interação com o solvente e com seus receptores-alvos. Também foi alvo de nossos esforços o estabelecimento de uma metodologia analítica propícia para a devida identificação e caracterização das populações conformacionais simuladas (Figura 7). 4.1 Derivação de cargas parciais para MM Inicialmente, cargas atômicas parciais foram derivadas através de métodos quânticos, utilizando o nível de teoria MP2/6-31G* [188,189] e solvatação implícita pelo Modelo do Contínuo Polarizável (PCM) [229], seguidas de um ajuste para reprodução da superfície de potencial eletrostático (ESP) [230]. As coordenadas da geometria otimizada e as cargas parciais atômicas foram utilizadas para o cálculo da direção, sentido e magnitude do vetor momento de dipolo. 4.2 Least Square Fit Solution O vetor momento de dipolo calculado por métodos quânticos foi utilizado como referência para derivação de cargas para a mecânica clássica (Figura 8). Para isso, o Capítulo 4. Procedimentos metodológicos 46 QUANTUM CALCULATIONS ESP Calculation MP2/6-31G* TOPOLOGY BUILDING AND SIMULATIONS Least-squares fit solution based on dipole moment vector Rescale MM dipole moment No Convergence Yes Organic liquid simulations Thermodynamic properties ANALYSES Simulation in solvent Dihedral distribution Quantum Torsional Profile MP2/6-31G* GROMOS53a6 Bonds Angles LJ parameters Torsional profile on MM New MM torsional parameters RotProf web server Metadynamics calculations of torsions Intramolecular interactions identification Docking studies Conformational populations characterization Figura 7 – Fluxograma geral do protocolo metodológico desenvolvido nessa tese, desde a parametrização de novas moléculas até sua simulação e análises. Etapas de elevado custo computacional estão destacadas em azul, de baixo custo computacional em verde e escolha ou geração de parâmetros em amarelo. algoritmo proposto consiste no emprego de limites superiores e inferiores para as cargas de cada átomo, de forma que a solução do algoritmo reproduz a direção e o sentido do momento de dipolo obtido por MQ [185]. A magnitude do vetor pôde ser ajustada para a reprodução de propriedades experimentais, como realizado no trabalho de Polêto et al. [185] e descrito na Sessão 4.5. Essa abordagem também permite o aproveitamento de grupos de carga já parametrizados em outros trabalhos e calibrados para a reprodução de propriedades experimentais. 4.3 Construção de topologia Para a construção da topologia das moléculas de interesse, parâmetros de ligação, ângulação e de Lennard-Jones foram obtidos à partir do campo de força GROMOS53a6 [165], enquanto as cargas atômicas parciais foram obtidas pelo método Least Square Fit. Novos termos topológicos para descrição diedral foram calculados com o objetivo de garantir maior acurácia na descrição das preferências torcionais. Capítulo 4. Procedimentos metodológicos MP2/6-31G* ESP 47 Least Square Fit approach MM atomic partial charges Figura 8 – Esquema de geração de cargas atômicas parciais. O vetor momento de dipolo é calculado por métodos quânticos e utilizado como referência para a derivação de um novo conjunto de cargas atômicas baseado em uma regressão linear com limites inferiores e superiores. Em azul, a fixação de um grupo de carga baseado no benzeno e, em vermelho, a determinação de limites. 4.4 Geração de perfis torcionais O perfil torcional de novos diedros foi obtida por cálculos quânticos, nos quais a geometria molecular inicial foi otimizada usando o método Hartree-Fock [187, 231] na base 3-21G* [74]. Após, a energia total da molécula foi otimizada e calculada para cada orientação diedral, utilizando o método MP2/6-31G* à cada 30o, obtendo-se então, a localização dos mínimos energéticos e a magnitude das barreiras. Para o cálculo do perfil torcional na mecânica molecular, as mesmas orientações diedrais avaliadas por métodos quânticos foram mantidas fixas durante uma minimização de energia utilizando o algoritmo conjugate gradient no software GROMACS 5.0.7 e um critério de convergência de 1,0 kJ/mol. Ambos perfis torcionais foram submetidos ao servidor Rotational Profiler [232], onde foram calculados parâmetros de mecânica molecular que, ao serem inseridos à topologia, reproduzem as barreiras e mínimos energéticos calculados pelos métodos quânticos. Capítulo 4. Procedimentos metodológicos 48 4.5 Simulação de líquidos orgânicos Tendo a topologia sido devidamente construída, um passo de calibração das cargas parciais foi incluído no protocolo caso existam propriedades fisico-químicas experimentais de líquidos orgânicos para o fragmento/composto à ser analisado. Para construir o sistema de um líquido orgânico à ser simulado (Figura 9), caixas cúbicas de 2×2×2 nm foram construídas, cada uma contendo uma única molécula. Um total de 125 dessas caixas foram empilhadas, formando uma única caixa de 10×10×10 nm, a qual foi simulada em alta pressão (100 bar) para indução da fase líquida. Os sistemas foram então equilibrados a 1 bar e a caixa foi novamente empilhada para obter uma única caixa de 1000 moléculas em fase líquida. Esse sistema foi mais uma vez equilibrado até que a energia total do sistema não variesse mais que 0,5 J/(mol×ns×Graus de liberdade). Tal critério é necessário para garantir maior acurácia no cálculos das propriedades físicoquímicas [168]. Todas as simulações foram realizadas utilizando o algoritmo de pressão e temperatura de Berendsen [233], utilizando τT = 0,2 ps e τP = 0,5 ps. Quando disponíveis, valores experimentais de compressibilidade isotérmica (κT ) e constante dielétrica (ε) dos líquidos orgânicos foram utilizados como parâmetros adicionais de simulação dos líquidos. Interações eletrostáticas foram calculadas utilizando o método Reaction-Field [234, 235], utilizando um cutoff duplo de 0,8 nm e 1,4 nm e a constante dielétrica experimental do líquido orgânico além do cutoff. 4.6 Cálculo de propriedades físico-químicas Com o intuito de calibrar o conjunto de cargas geradas pelo método de Least Square Fit (Sessão 4.2), propriedades físico-químicas dos líquidos orgânicos foram calculadas e correlacionadas com as propriedades experimentais publicadas na literatura [236–243]. Para o cálculo da densidade dos líquidos (ρ), simulações à pressão constante foram realizadas por 10 ns e ρ foi calculada por médias de 5 blocos de 2ns [168]. A entalpia de vaporização (∆Hvap) também foi calculada por média de blocos dos mesmos 10 ns para obtenção da energia potencial da fase líquida (Epot(l)) e por meio de média de blocos de simulações de 100 ns em fase gasosa para obtenção energia potencial em fase gasosa (Epot(g)), utilizando um integrador de dinâmica estocástica (SD) [244]. Assim, ∆Hvap foi calculada pela equação: ∆Hvap = (Epot(g) + kBT ) − Epot(l) (4.1) Capítulo 4. Procedimentos metodológicos Empilhamento de 125 moléculas e indução de fase líquida Empilhamento para 1000 moléculas e equilibração 49 Acoplamento NPT em 3 temperaturas T-10 K T T+10K Cpcla , αp Acoplamento NVT em 3 pressões 0,9P bar P 1,1P bar Simulação em fase gasosa Simulação em fase líquida ∆Hvap ρ , ε KT Figura 9 – Esquema de simulação de líquidos orgânicos. Após a equilibração, o sistema é simulado em diferentes temperaturas e pressões para a obtenção de suas propriedades físico-químicas. Para o cálculo da constante dielétrica (ε), as simulações dos sistemas líquidos dos quais foram obtidos ρ foram estendidos até a convergência de ε, que foi calculada usando médias rolantes. Para a obtenção do coeficiente de expansão térmico (αP ) e capacidade calorífica clássica (CP cla), três simulações à pressão constante foram realizadas por 5 ns, com temperaturas T, T+10K e T-10K, respectivamente. O cálculo de αP e CP cla foi feito utilizando o método de diferenças finitas Kunz e Van Gunsteren [245]: 1 αP ≈ V ∂V ≈ − ln ρ T2 − ln ρ T1 ∂T P T2 − T1 e: CP ≈ ∂U ≈ U T2 − U T1 ∂T P T2 − T1 (4.2) (4.3) Para o cálculo da compressibilidade isotérmica (κT ), três simulações de volume constante foram realizadas por 5 ns cada, alterando o volume da caixa de simulação para gerar densidades proporcionais à 100%, 90% e 110% da densidade média ρ calculada. O Capítulo 4. Procedimentos metodológicos 50 cálculo de κT foi feito utilizando o método de diferenças finitas como na equação: κT ≈ −1 V ∂V ≈ ln ρ2 − ln ρ1 ∂P T P ρ2 − P ρ1 (4.4) É importante frisar que o erro absoluto entre as propriedades físico-químicas de líquidos orgânicos calculadas e experimentais foi utilizado como critério de validação topológica para cada um dos líquidos simulados. Nos casos em que os erros absolutos foram demasiadamente grandes, um novo conjunto de cargas foi derivado utilizando o método Least Square Fit (Sessão 4.2, ajustando unicamente a magnitude vetor do momento de dipolo. Esse processo foi realizado iterativamente até a mitigação do erro absoluto entre propriedades experimentais. 4.6.1 Simulações de energia livre de solvatação Ainda, simulações em água foram realizadas para avaliar as energias-livres de solvatação (∆Ghyd) a 1 bar e 298 K. Cada molécula foi centralizada em uma caixa cúbica com dimensões apropriadas para reproduzir a densidade do modelo de água SPC (0,997 g/cm3). Para o cálculo de energia-livre, foi utilizado o método de integração termodinâmica junto à um parâmetro de acoplamento λ para perturbar as interações soluto-solvente de acordo com a equação: 1 ∂H ∆Gsim = 0 dλ ∂λ λ (4.5) na qual H é a Hamiltoniana, λ = 0 se refere ao estado no qual interações soluto-solvente não existem e λ = 1 se refere ao estado no qual o soluto interagem completamente com o solvente (Figura 10). Em nossos cálculos, interações eletrostáticas foram desacopladas antes das interações de Lennard-Jones, utilizando um soft-core potential para evitar problemas relacionados à interações de Lennard-Jones demasiadamente fortes [246]. Um soft-core power foi definido como 1 e αLJ como 0.5, seguindo recomendações de Shirts e colaboradores [247]. Ambas interações foram desacopladas utilizando os valores de λ: 0; 0,02; 0,04; 0,07; 0,1; 0,15; 0,2; ...; 0,8; 0,85; 0,9; 0,93; 0,96; 0,98 e 1,0, totalizando 50 simulações. O protocolo de simulação consistiu de uma minimização inicial utilizando o algoritmo steepest-descent, seguido por outra utilizando o algoritmo L-BFGS até a força máxima no sistema ser menor que 10 kJ/(mol-1 nm-1). Após, velocidades iniciais foram inseridas e o sistema foi equilibrado por 100 ps utilizando um acoplamento NVT para cada λ. Os sistemas foram submetidos à adicionais 100 ps de equilibração Capítulo 4. Procedimentos metodológicos 51 Figura 10 – Esquema de simulação de energia livre de solvatação utilizando o método de integração termodinâmica. A aplicação de λ modula a interação entre soluto e solvente desde a completa inexistência (λ = 0) até a completa interação (λ = 1). sob acoplamento NPT, utilizando o barostato Parrinello-Rahman [248], τt = 5 ps e compressibilidade de 4,5 × 105 bar-1. Finalmente, a fase de produção foi simulada utilizando o integrador de Langevin [244] para amostrar a ∂H/∂λ λ. Assim, o tempo de simulação variou entre 1 e 5 ns. Ainda, a última conformação adotada pelo sistema na fase de produção foi utilizada como inicio do simulação do λ subsequente. As propriedades físico-químicas calculadas para os líquidos orgânicos foram comparadas com seus valores experimentais através do erro absoluto (Equação 4.6, no intuito de avaliar a qualidade do conjunto de cargas final. Quando este erro foi demasiadamente grande, ajustes na magnitude do momento de dipolo foram realizados utilizando o método Least Square Fit. Assim, nova rodada de simulações dos líquidos orgânicos foi realizada, seguida do cálculo das propriedades físico-químicas e posterior cálculo dos erros absolutos até a obtenção do um conjunto de cargas que melhor descrevesse as propriedades avaliadas. |Experimental − Calculada| E% = Experimental (4.6) 4.7 Simulação em Solvente Na ausência de propriedades físico-químicas experimentais de líquidos orgânicos para o fragmento/molécula de interesse, a calibração do conjunto de cargas fica impedida de ser feita aos moldes da filosofia GROMOS. Assim, o conjunto de cargas à ser utilizado pode ser baseado em fragmentos previamente calibrados [185, 214, 249]. Dessa forma, a topologia construída para o ligante foi utilizada para simular sua dinâmica em solvente. Para isso, uma caixa dodecaédrica foi construída ao redor do soluto e preenchida Capítulo 4. Procedimentos metodológicos 52 com modelos de água SPC. O sistema foi minimizado até a força máxima convergir para valores menores que 1,0 kJ/mol×nm e equilibrado sob um acoplamento NVT à temperatura de interesse, utilizando o termostato Nosé-Hoover [250]. A fase de produção foi simulada por tempo suficiente para ver múltiplas transições conformacionais, utilizando o termostato V-rescale [251], o barostato Parrinelo-Rahman [248], τT = 0,1 ps e τP = 2,0 ps. Interações eletrostáticas foram calculadas utilizando o método Reaction-Field [234,235], utilizando um cutoff duplo de 0,8 nm e 1,4 nm e uma constante dielétrica RF = 62 além do cutoff para modelagem da água [165, 252]. 4.8 Metadinâmica de torções Simulações de metadinâmica foram realizadas para avaliar o impacto do solvente e de substituições vicinais nas torções de interesse utilizando o pacote GROMACS 5.1.4 em conjunto com o pacote PLUMED 2.0b1 [253]. Para isso, as moléculas foram simuladas por 50 ns em caixas cúbicas preenchidas com solvente de interesse. O sistema teve sua energia minimizada utilizando o algoritmo conjugate gradient e foi equilibrado por 2 ns sob acoplamento NVT à temperatura de interesse, utilizando o termostato Nosé-Hoover [250]. Em seguida, o sistema foi submetido à simulações de metadinâmica controlada, nas quais gaussianas de altura inicial de 1,2 kcal.mol-1 e largura de 3 Å foram aplicadas às torções que foram definidas como variáveis coletivas para análise (processo representado pela Figura 11). A pressão foi mantida constante a 1 bar utilizando o barostato ParrinelloRahman e τP = 2,0 ps, enquanto a temperatura foi mantida constante utilizando o termostato V-rescale e τT = 0,1 ps. O método LINCS [254,255] foi aplicado para manter as ligações covalentes, permitindo assim um tempo de integração de 2 fs. Interações eletrostáticas foram calculadas utilizando o método Reaction-Field [234, 235], utilizando um cutoff duplo de 0,8 nm e 1,4 nm e uma constante dielétrica RF = 62 além do cutoff para modelagem da água [165, 252]. A superfície de energia livre foi calculada utilizando a ferramenta sum hills do PLUMED e os erros estimados foram calculados através de médias por blocos, enquanto o reponderamento ocorreu confome o trabalho de Branduardi, Bussi e Parrinello [256]. 4.9 Análises As análises à seguir têm como objetivo caracterizar estruturalmente as principais conformações de ligantes livres em solução, com o intuito de diminuir o espaço de busca conformacional nos estudos de ancoragem molecular. Capítulo 4. Procedimentos metodológicos 53 Figura 11 – Esquema representativo de simulações de metadinâmica. Os mínimos energéticos A, B e C são separados pelos estados de transição TS1 e TS2. O viés introduzido pela metadinâmica é representado pelas linhas mais escuras até vermelhas, em função do tempo de simulação. O uso do viés permite a transição entre os mínimos energéticos, de forma a varrer a superfície energética do sistema em função das variáveis coletivas escolhidas. Uma vez que toda a superfície energética é preenchida pelas gaussianas de potencial adicionadas pelo viés, o sistema se transitará de forma difusa entre A, B e C, e a simulação poderá ser interrompida, obtendo assim a energia livre relacionada à cada mínimo energético. 4.9.1 Distribuição diedral ao longo da simulação A distribuição das preferências diedrais das moléculas simuladas em solvente foram analisadas e utilizadas para a identificação de populações conformacionais. Ainda, nas simulações das moléculas livres em solução, o número de transições entre populações diedrais foi monitorado para garantir a melhor amostragem conformacional e, consequentemente, o tempo de simulação suficiente. Capítulo 4. Procedimentos metodológicos 54 4.9.2 Caracterização de populações conformacionais Um algoritmo foi desenvolvido em parceria com o Laboratório de Bioinformática Estrutural e Biologia Computacional no Instituto de Informática (UFRGS) para identificar populações conformacionais de uma molécula e suas respectivas frequências [143]. O algoritmo identifica as combinações dos ângulos diedrais e calcula a sua frequência ao longo da simulação, fornecendo assim dados quantitativos sobre as populações conformacionais majoritárias em solução. Ainda, o algoritmo gera um conjunto de conformações para cada população conformacional identificada, permitindo análise de interações intra e intermolecular de cada população. O código para uso está disponível na Sessão de Anexos B.3. 4.9.3 Identificação de interações intra e intermoleculares Sabendo que interações intra e intermoleculares podem impactar na conformação de moléculas em solução, foram medidas as ligações de hidrogênio entre os heteroátomos e o solvente, bem como a camada de solvatação ao redor desses heteroátomos, a energia livre de quebra das ligações de hidrogênio e seu tempo de residência. Ainda, quando em posse de dados experimentais de distâncias inter-prótons medidos por RMN (NOESY), hidrogênios explícitos foram inseridos na trajetória já calculada, permitindo a medição das distâncias ao longo da simulação. Como mostrado em trabalhos anteriores [143, 216], o uso dessa validação permite inferir sobre a acurácia da descrição conformacional dessas moléculas em solução. 5 Resultados 55 “Everything is theoretically impossible, until it is done.” Robert A. Heinlein Os resultados da presente tese foram organizados nos seguintes capítulos, conforme indicado nos objetivos: I. Desenvolvimento de metodologias de parametrização sistemática de ligantes; Marcelo D. Polêto, Victor H. Rusu, Bruno I. Grisci, Marcio Dorn, Roberto D. Lins, Hugo Verli. Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities, Frontiers in Pharmacology, 2018, 9, 395. II. Análise da amostragem conformacional sistemática de moléculas em solução; Pablo R. Arantes, Marcelo D. Polêto, Elisa B. O. John, Conrado Pedebos, Bruno I. Grisci, Marcio Dorn, Hugo Verli. Development of GROMOS-compatible parameter set for simulations of chalcones and flavonoids, Journal of Physical Chemistry B, 2019, em impressão. III. Dinâmica de ligantes em solução em seu impacto no reconhecimento molecular; Roberta Tesch, Christian Becker, Matthias P. Müller, Michael E. Beck, Lena Quambusch, Matthäus Getlik, Jonas Lategahn, Niklas Uhlenbrock, Fanny N. Costa, Marcelo D. Polêto, Pedro S. M. Pinheiro, Daniel A. Rodrigues, Carlos M. Sant’Anna, Fabio F. Ferreira, Hugo Verli, Carlos A. M. Fraga, Daniel Rauh. An Unusual Intramolecular Halogen Bond guides Conformational Selection, Angewandte Chemie, 2018, 57(31), 9970-9975. IV. Capítulo "Campos de Força", que comporá a 2a edição do livro "Bioinformática: da biologia à flexibilidade molecular; Capítulo 5. Resultados 56 5.1 Capítulo I Visando o desenvolvimento de uma estratégia de parametrização de pequenos ligantes, uma série de 103 anéis aromáticos comumente utilizados no desenho racional de fármacos [257–261] foram selecionados para esse estudo. Dentre eles, um conjunto de calibração de 42 anéis - para os quais propriedades físico-químicas de líquidos orgânicos são conhecidas - foram selecionados, permitindo também análise cruzada com o trabalho de Caleman et al. [168] e Horta et al. [184]. Após a calibração, a mesma estratégia de parametrização foi utilizada para geração de novos termos topológicos para os 61 anéis aromáticos restantes, totalizando 103 moléculas. Essas moléculas foram simuladas em ambiente aquoso por 250 ns e foram medidas propriedades dinâmicas das interações de cada um de seus heteroátomos com a água, como o número médio de ligações de hidrogênio (AverHB), seu tempo de residência (τHB), sua meia-vida (lif etimeHB), energia livre de rompimento dessas ligações (∆GHB) e ocupância ao longo da simulação (P ercent). A avaliação e quantificação dessas propriedades aumenta o suporte teórico para o desenho de fármacos, uma vez que esses tipos de propriedades não são dedutíveis a partir de estruturas bidimensionais. Ainda, os resultados auxiliam o planejamento racional de determinados grupamentos químicos, visando melhores interações com o complexo receptor-ligante ou até mesmo um menor custo entálpico de dessolvatação, impactando a energia livre de ligação. ORIGINAL RESEARCH published: 24 April 2018 doi: 10.3389/fphar.2018.00395 Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities Marcelo D. Polêto 1, Victor H. Rusu 2, Bruno I. Grisci 3, Marcio Dorn 3, Roberto D. Lins 4 and Hugo Verli 1* 1 Grupo de Bioinformática Estrutural, Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, 2 Swiss National Supercomputing Centre, Lugano, Switzerland, 3 Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, 4 Instituto Aggeu Magalhães, Fundação Oswaldo Cruz, Recife, Brazil Edited by: Adriano D. Andricopulo, University of São Paulo, Brazil Reviewed by: Antonio Monari, Université de Lorraine, France Gustavo Trossini, Universidade de São Paulo, Brazil *Correspondence: Hugo Verli hverli@cbiot.ufrgs.br Specialty section: This article was submitted to Experimental Pharmacology and Drug Discovery, a section of the journal Frontiers in Pharmacology Received: 09 November 2017 Accepted: 05 April 2018 Published: 24 April 2018 Citation: Polêto MD, Rusu VH, Grisci BI, Dorn M, Lins RD and Verli H (2018) Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities. Front. Pharmacol. 9:395. doi: 10.3389/fphar.2018.00395 The identification of lead compounds usually includes a step of chemical diversity generation. Its rationale may be supported by both qualitative (SAR) and quantitative (QSAR) approaches, offering models of the putative ligand-receptor interactions. In both scenarios, our understanding of which interactions functional groups can perform is mostly based on their chemical nature (such as electronegativity, volume, melting point, lipophilicity etc.) instead of their dynamics in aqueous, biological solutions (solvent accessibility, lifetime of hydrogen bonds, solvent structure etc.). As a consequence, it is challenging to predict from 2D structures which functional groups will be able to perform interactions with the target receptor, at which intensity and relative abundance in the biological environment, all of which will contribute to ligand potency and intrinsic activity. With this in mind, the aim of this work is to assess properties of aromatic rings, commonly used for drug design, in aqueous solution through molecular dynamics simulations in order to characterize their chemical features and infer their impact in complexation dynamics. For this, common aromatic and heteroaromatic rings were selected and received new atomic charge set based on the direction and module of the dipole moment from MP2/6-31G* calculations, while other topological terms were taken from GROMOS53A6 force field. Afterwards, liquid physicochemical properties were simulated for a calibration set composed by nearly 40 molecules and compared to their respective experimental data, in order to validate each topology. Based on the reliance of the employed strategy, we expanded the dataset to more than 100 aromatic rings. Properties in aqueous solution such as solvent accessible surface area, H-bonds availability, H-bonds residence time, and water structure around heteroatoms were calculated for each ring, creating a database of potential interactions, shedding light on features of drugs in biological solutions, on the structural basis for bioisosterism and on the enthalpic/entropic costs for ligand-receptor complexation dynamics. Keywords: drug design, GROMOS, aromatic rings, functional groups, interactions Frontiers in Pharmacology | www.frontiersin.org 1 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution 1. INTRODUCTION The development of a drug is a multi step process, usually starting with the identification of hit compounds. The challenging task of optimizing these compounds into leads and finally into drugs is commonly facilitated by computer aided drug design (CADD) techniques (Anderson, 2003; Sliwoski et al., 2013; Bajorath, 2015). With the growing information on protein structure on the last years, structure based drug design (SBDD) has become a significant tool for hit discovery (Anderson, 2003; Lounnas et al., 2013; Lionta et al., 2014). When structural information of the receptor is absent, molecular fingerprints of approved drugs are also used to search for new ligands in a process also known as ligand based drug design (LBDD) (Lee et al., 2011). Nevertheless, there are still considerable challenges associated to the predictiveness of ligand potency and affinity via computational methods (Paul et al., 2010; Csermely et al., 2012). In general, optimization of lead compounds is based in qualitative or quantitative structure-activity relationships (SAR or QSAR, respectively) (Shahlaei, 2013). These relationships are usually based in molecular descriptors to predict ligand pharmacodynamics and pharmacokinetics, such as logP to access lipophilicity, logS to access solubility or pKa to access the ionic state of a compound, along with other topological, geometrical and physicochemical descriptors (Danishuddin and Khan, 2016). While some correlations have reasonable power of predictiveness, many descriptors have no biological meaning and can mislead the optimization process. As highlighted by Hopkins et al. (2014), high-throughput screening methods have been linked to the rise of hits with inflated physicochemical properties during the optimization process (Keserü and Makara, 2009). Also, recent reviews have shown an increase of molar mass in the recent medicinal chemistry efforts (Leeson and Springthorpe, 2007) and many authors correlate this strategy with the likelihood of poor results of such compounds (Gleeson, 2008; Waring, 2009, 2010; Gleeson et al., 2011). Many chemical moieties are regularly used in medicinal chemistry to produce chemical diversity (Bemis and Murcko, 1996; Welsch et al., 2010; Taylor et al., 2014), a practice wellknown as fragment based drug design (FBDD), and its use for pharmacophore modeling and to prevent high toxicity is not recent (Gao et al., 2010). Particularly, aromatic rings are extensively used in drugs due to their well known synthetic and modification paths (Aldeghi et al., 2014). For example, at least, one aromatic ring can be found in 99% of a database containing more than 3,500 evaluated by the medicinal chemistry department of Pfizer, AstraZeneca (AZ) and GlaxoSmithKlin (GSK) (Roughley and Jordan, 2011). Still, little is known about their chemical features in biological solution, such as H-bonds availability, lifetime of H-bonds, solvent accessibility, and conformational ensemble. In this sense, molecular dynamics (MD) simulations can provide useful information with atomistic resolution and access the aforementioned features of chemical groups in water, providing fundamental data to drive medicinal chemistry approaches. Still, dynamical properties of chemical moieties in biological solution are usually neglected in drug design and very difficult to access (Ferenczy and Keseru, 2010; Reynolds and Holloway, 2011; Hopkins et al., 2014). Even though MD simulations have been used in medicinal chemistry to generate different receptor conformers and to validate binding poses predicted by docking (Zhao and Caflisch, 2015; Ganesan et al., 2017), simulations of free ligand in solution is rarely used to access the conformational ensemble and energies associated with solvation due to the challenge on solving conformational flexibility and internal energies (Butler et al., 2009; Blundell et al., 2016). When solvated, the enthalpic and entropic costs of disrupting a Hbond or dismantling the entire solvation shell of a ligand can be the determinant step to provide the proper energy of binding (Biela et al., 2012; Blundell et al., 2013; Mondal et al., 2014). Yet, free-energy of binding is often predicted via geometrical or alchemical transformations (Zwanzig, 1954; Aqvist et al., 1994; Woo and Roux, 2005; Gumbart et al., 2013), alongside with recent developments in funnel metadynamics (Limongelli et al., 2013). More recently, thermodynamical features of ligands have been experimentally investigated in order to enhance binding and efficiency (Freire, 2009; Ferenczy and Keseru, 2010; Reynolds and Holloway, 2011). Ligand features such as H-bonds lifetime, effects of vicinity in H-bonds availability and strength, accessible surface area and water structure around binding sites can provide substantial information for designing new molecular entities (Blundell et al., 2016). Different force fields have been used for drug design purposes, such as MMF94 (Halgren, 1996), OPLS-AA (Jorgensen et al., 1996), and GAFF (Wang et al., 2004). While these force fields parameterized their electrostatic terms using ab initio calculations, the GROMOS force fields (derived from the Groningen Molecular Simulation package) used free-energy of solvation as target (Daura et al., 1998; Oostenbrink et al., 2004) to empirically assign atomic partial charges. Thus, in this work, we have chosen the GROMOS force field to simulate the dynamical behavior of 103 aromatic rings (including a calibration subset of 42 molecules) mostly commonly used in drug design and their interactions with solvent in order to access thermodynamical properties in solution. These interactions, in turn, offer a reference for future rational drug design studies, as describe in details how several functional groups interact with their surroundings. 2. METHODS 2.1. Selection of Rings A series of 103 aromatic rings commonly used in drug design were selected for this study (Broughton and Watson, 2004; Jordan and Roughley, 2009; Welsch et al., 2010; Taylor et al., 2014, 2017). Among them, a calibration set of 42 molecules (Table 1), for which physical-chemical properties are known, were selected from the benchmark developed by Caleman et al. (2012). Briefly, both works of Taylor et al. (2014, 2017) employed a detailed search of substructure frequencies from FDA Orange Book and cross referenced with ChEMBL, DrugBank, Nature, Drug Reviews, the FDA Web site, and the Annual Reports in Medicinal Chemistry; the work of Broughton and Watson (2004) employed search of substructure frequencies in MDL Drug Data Frontiers in Pharmacology | www.frontiersin.org 2 April 2018 | Volume 9 | Article 395 Polêto et al. TABLE 1 | Charge groups (colored) and aromatic rings used as calibration set in this work. Aromatic Rings Interactions in Aqueous Solution Report database by using a “Phase II” keyword; and the work of Welsch et al. (2010) have pinpointed privileged scaffolds from natural-products works throughout literature. 2.2. Topology Construction Structures for these aromatic rings were built using Avogadro (Hanwell et al., 2012). Molecular mechanical (MM) topological parameters as bonds, angles, and Lennard-Jones parameters were taken from GROMOS53A6 (Oostenbrink et al., 2004). Due to the well–known good performance of MP2 methods for small aromatic rings (Li et al., 2015; Matczak and Wojtulewski, 2015), atomic partial charges were based on quantum mechanical (QM) calculations using MP2 theory (Møller and Plesset, 1934), 631G∗ (Petersson et al., 1988) basis set and implicit solvent Polarizable Continuum Model (PCM) (Mennucci and Tomasi, 1997) followed by a RESP fitting (Bayly et al., 1993). The so obtained partial charges were adjusted in the MM to reproduce the QM dipole moment of the ring. The angle θ formed between the QM and MM model dipole moment vectors was monitored through an in house script to make sure the angle had the lowest value possible, guaranteeing the conservation of the QM dipole moment direction. For our calibration set, the module of the MM partial charges were adjusted to better reproduce the physicochemical properties of the organic liquids. Following the philosophy of charge group assignment, groups were limited, at maximum, to the atoms at the ortho position on each ring. In more complex substitution patterns, a superimposition of two charge groups was required to correctly describe the chemical group. In such cases, the Coulombic terms of the overlapping atoms were adjusted to correctly describe the direction of the total dipole moment of the ring. For molecules containing linear constraints (benzonitrile), virtual sites were added in order to preserve the total moment of inertia and mass, thus preserving the linearity of these groups (Feenstra et al., 1999). 2.3. New Torsional Potentials The quantum mechanical torsional profile of every dihedral angle was calculated using Gaussian (Frisch et al., 2016) (RRID:SCR_014897). Molecular structures were built using Avogadro (Hanwell et al., 2012) and their geometry were optimized using Hartree-Fock method (Fock, 1930; Hartree and Hartree, 1935) and basis set 3-21G∗ (Dobbs and Hehre, 1986). Afterwards, the Scan routine was used to calculate the total energy of the molecule conformation for each dihedral Frontiers in Pharmacology | www.frontiersin.org 3 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution orientation, adopting a tight convergence criteria, with geometric optimization, MP2/6-31G∗ and steps of 30◦. In order to calculate the torsional profile for molecular mechanics model, dihedral orientations were kept fixed during minimization using restraint forces for the same angles evaluated by quantum calculations. Both profiles were submitted to the Rotational Profiler server (Rusu et al., 2014) to obtain appropriate sets of classical mechanics parameters that provided a better fitting to the QMobtained torsional profile. 2.4. General Simulation Settings All simulations were carried out using the GROMACS 5.0.7 package (Abraham et al., 2015) (RRID:SCR_014565). In order to create parameters compatible with the GROMOS family, we have followed previous literature (Daura et al., 1998; Schuler et al., 2001; Oostenbrink et al., 2004) settings: twin-range scheme was used with short- and long-range cutoff distances of 0.8 and 1.4 nm, respectively. Also, the reaction-field method was applied to correct the effects of electrostatic interactions beyond the long-range cutoff distance (Barker and Watts, 1973; Tironi et al., 1995), using the dielectric constant as εRF for organic liquid simulations and εRF = 62 for simulations in water (Heinz et al., 2001; Oostenbrink et al., 2004). The LINCS algorithm (Hess et al., 1997; Hess, 2008) was used to constrain all covalent bonds, using a cubic interpolation, a Fourier grid of 0.12 nm and timestep of 2 fs. Configurations were saved at every 2 ps for analysis. (Van Gunsteren and Berendsen, 1988) with a single molecule in vacuum, to obtain Epot(g) as the equation: Hvap = (Epot(g) + kBT) − Epot(l) (1) Aiming to calculate the dielectric constant (ε), the simulation of the liquid boxes from which ρ were obtained were extended up to 60 ns. Convergence calculations of ε were done using running averages and ε were evaluated only after convergence. In order to calculate thermal expansion coefficients (αP) and classic isobaric heat capacities (CPcla), three constant pressure simulations were carried out for 5 ns each, with temperatures T, T+10K, and T10K, for each liquid. The calculations of αP and CPcla were done using the finite difference method (Kunz and van Gunsteren, 2009): 1 αP ≈ V ∂V ≈ ln − ρ T2 − ln ρ T1 ∂T P T2 − T1 (2) and: CP ≈ ∂U ≈ U T2 − U T1 ∂T P T2 − T1 (3) In order to calculate isothermal compressibilities (κT), three constant volume simulations were carried out for 5 ns each, with pressures 1, 0.9, and 1.1 bar. The calculations of κT was also done using the finite difference method: 2.4.1. Organic Liquids Simulations In order to build the organic liquid systems, cubic boxes of 2×2×2 nm were created, each with a single organic molecule. A total of 125 of these boxes were stacked, forming an unique box with conventional periodic boundary conditions treatment of 10×10×10 nm which was simulated under high pressure (100 bar) to induce liquid phase. The systems were then simulated and equilibrated at 1 bar. Afterwards, the boxes were staggered to obtain systems with 1000 molecules in liquid phase and simulated at 1 bar until the total energy drift converged to values below 0.5 J/(mol×ns×Degrees of Freedom). Such criterion is necessary to make sure that the fluctuating properties could be accurately calculated (Caleman et al., 2012). All simulations were carried out with Berendsen pressure and temperature coupling algorithm due to their efficiency in molecular relaxations (Berendsen et al., 1984), using τT = 0.2 ps and τP = 0.5 ps. When available, experimental values of isothermal compressibility and dielectric constant were used as an additional parameter for liquid simulations. Otherwise, the compressibility of the most chemically similar molecule was used. The experimental dielectric constants from each liquid were also used as parameters in the simulations (Oostenbrink et al., 2004). In order to calculate the densities of liquids (ρ), simulations at constant pressure were carried out for 10 ns and ρ were calculated using block averages of 5 blocks. Enthalpy of vaporization ( Hvap) were calculated by block averaging the same 10 ns of liquid simulation to obtain Epot(l) and another 100 ns of gas phase simulation using a stochastic dynamics integrator (SD) 1 κT ≈ V ∂V ≈ − ln ρ2 − ln ρ1 ∂P T P ρ2 − P ρ1 (4) 2.4.2. Solvation Free Energy Simulations Simulations in water were carried out to evaluate the solvation free energies ( Ghyd) of 30 molecules at 1 bar and 298 K. Each aromatic ring (solute) was centered into a cubic box with appropriate dimensions to reproduce the density of SPC water models (0.997 g/cm3). In free-energy calculations using thermodynamic integration (TI) method, a coupling parameter λ is used to perturb solute-solvent interactions. 1 ∂H Gsim = 0 dλ ∂λ λ (5) in which H is the Halmiltonian, λ = 0 refers to the state in which the solute fully interacts with the solvent and λ = 1 refers to the state in which the solute-solvent interactions do not exist. In our setup, Coulombic interactions were decoupled first, and the Lennard-Jones interactions after, using a soft-core potential to avoid issues related to strong Lennard-Jones interactions (Beutler et al., 1994). A soft-core power was set to 1 and αLJ set to 0.5, following recommendations of Shirts and Pande (2005). Both interactions were decoupled using λ values: 0, 0.02, 0.04, 0.07, 0.1, 0.15, 0.2, ..., 0.8, 0.85, 0.9, 0.93, 0.96, 0.98, 1, totalizing 50 λ simulations. Our simulation protocol consisted of an initial steepestdescent minimization, followed by a L-BFGS minimization until a maximum force of 10 kJ/(mol-1 nm-1) was reached. Frontiers in Pharmacology | www.frontiersin.org 4 April 2018 | Volume 9 | Article 395 Polêto et al. TABLE 2 | Dataset of aromatic rings evaluated in this work. Heteroatoms are highlighted in colors. Aromatic Rings Interactions in Aqueous Solution After, initial velocities were assigned and the systems were equilibrated for 100 ps using a NVT ensemble at each λ. The systems were subjected to another 100 ps of equilibration on a NPT ensemble, using the Parrinello-Rahman pressure coupling algorithm (Parrinello and Rahman, 1981), a τt = 5 ps time constant for coupling and a compressibility of 4.5 × 105 bar-1. Finally, production simulations were done using the Langevin integrator (Van Gunsteren and Berendsen, 1988) to sample the ∂H/∂λ λ until convergence. Therefore, simulations time varied between 1 and 5 ns. In addition, the last frame of the production phase of each λ was used as input for the next subsequent λ. 2.4.3. Simulation of Rings in Water After an extensive comparison of simulated and experimental physicochemical properties of our calibration set and consequent validation, the same strategy of topological construction was applied to other 61 rings commonly used in drug design (Table 2) for which experimental properties are not available, totalizing 103 aromatic rings in this study. Hence, in order to evaluate chemical features and interactions of aromatic rings with their surroundings, a total set of 103 aromatic was simulated in water, including all 42 molecules present in the calibration set (Table 1). Each solute was placed in a cubic box with a distance of 1.0 nm to its edges. The boxes were then filled with SPC water model and minimized long enough eliminate any possible clashes until convergence at a maximum force of 0.1 kJ/mol×nm. After, the system was equilibrated in a NVT ensemble at 298.15 K using the Nosé-Hoover algorithm (Nosé, 1984) for temperature coupling. Production runs of 250 ns were carried out with temperature and pressure coupling handled by V-rescale (Bussi et al., 2007) and Parrinelo-Rahman (Parrinello and Rahman, 1981) algorithms, using τT = 0.1 ps and τP = 2.0 ps. The GROMACS tools hbond, rdf, and sorient were used to calculate H-bonds related properties and solvation structure around the heteroatom using a block-averaging approach over 5 box of 50 ns. Frontiers in Pharmacology | www.frontiersin.org 5 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution FIGURE 1 | Evaluation of torsional parameters and dihedral distribution. QM and adjusted MM torsional profiles are shown in black and green, respectively. In red, the dihedral distribution during simulations. 3. RESULTS 3.1. New Torsional Profiles In order to accurately describe the torsional angles of the selected aromatic rings, a total of 15 new dihedral potentials were derived by fitting the MM profiles to the corresponding QM-calculated ones (Table S1). Fittings were conducted using the Rotational Profiler server (Rusu et al., 2014). For all cases, the use of new parameters yield almost identical values of minimum and barrier amplitudes to those calculated by QM (Figure 1). Dihedral distribution throughout simulations was also evaluated. 3.2. Physical-Chemical Properties In order to validate our strategy of topology building, boxes of organic liquids were simulated to obtain physical-chemical properties for each compound. Reference experimental values (Table S2) were used to calculate the absolute error of each property and to guide adjustments on the coulombic terms in order to mitigate deviations. We have calculated the θ angle between QM and MM dipole moments and the final version of our calibration set (Table 1) yielded an average θ angle of 2.5◦ ± 6.1◦, suggesting that our MM models conserve the direction of the QM dipole moment, preserving the electrostatic potential of each molecule. Following the GROMOS philosophy (Oostenbrink et al., 2004; Horta et al., 2016), density (ρ), enthalpy of vaporization ( Hvap), and free energy of solvation ( Ghyd) were used as targets for the parametrization, while isothermal coefficient (αP), isothermal compressibility (κT), dielectric constant (ε), and classic isobaric heat capacity (CPcla) were calculated as benchmarks for GROMOS performance and compared with the results obtained in Caleman et al. (2012) and Horta et al. (2016) (Table 3). Linear regression between experimental and simulated values were calculated in order to access the prediction power of the employed strategy (Figure 2). The equations further reported were calculated excluding outliers (values higher than 2 standard deviations). Regarding the targeted properties, our calibration set yielded the equations y = 0.9118x + 0.1001 for density, y = 1.0699x − 1.6491 for enthalpy of vaporization and y = 0.8676x + 0.8929 for free energy of solvation, with correlation coefficients of R = 0.92, R = 0.96, and R = 0.89, respectively. In terms of average deviation (AVED), our calibration set overestimates ρ in 0.008 g/cm3, Hvap in 1.51 kJ/mol and underestimates Ghyd in 3.35 Frontiers in Pharmacology | www.frontiersin.org 6 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution TABLE 3 | Average deviation between experimental and simulated physicochemical properties of aromatic rings evaluated in our calibration set. Simulated GAFF and OPLS-AA values were obtained from Caleman et al. (2012) and 2016H66 values from Horta et al. (2016). Density (ρ) in g/cm3, enthalpy of vaporization ( Hvap) in kJ/mol, thermal expansion coefficient (αP) in 10-3/K, isothermal compressibility (κT ) in 1/GPa, dielectric constant (ε), classic isobaric heat capacity (Cpcla) in J/mol×K, and free-energy of solvation ( Ghyd) in kJ/mol. Properties Force field Statistical N Average Dev. St. Dev. R coefficient This work 42 0.008 0.051 0.92 2016H66 6 0.016 0.019 0.99 ρ GAFF 40 −0.008 0.045 0.93 OPLS-AA 40 0.001 0.025 0.98 Hvap This work 2016H66 GAFF OPLS-AA 42 6 40 40 1.514 2.257 2.298 3.243 4.457 6.758 5.419 5.216 0.96 0.96 0.88 0.90 This work 42 88.201 33.440 0.77 2016H66 6 98.712 35.232 0.63 Cpcla GAFF 37 133.884 40.225 0.84 OPLS-AA 37 129.397 35.330 0.91 This work 42 2016H66 6 αP GAFF 40 OPLS-AA 40 0.146 0.171 0.224 0.155 0.210 0.148 0.220 0.210 0.82 0.91 0.58 0.64 This work 42 0.046 0.500 0.70 2016H66 6 κT GAFF 40 0.276 0.054 0.279 0.150 0.71 0.77 OPLS-AA 40 −0.016 0.130 0.78 This work 42 −4.523 5.650 0.65 2016H66 6 −2.217 2.515 0.89 ε GAFF 29 −4.254 2.740 0.97 OPLS-AA 33 −4.564 5.600 0.72 kJ/mol. Without the outliers, the AVED for Ghyd improves to 2.83 kJ/mol. Non-targeted properties were calculated to evaluate how they behaved in our simulations. Linear regressions yielded equations y = 0.93825x + 0.1406 for αP (R = 0.82), y = +0.90079x − 0.0140 for κT (R = 0.70), y = 0.2581x + 1.8961 for ε (R = 0.65), and y = 0.8989x + 100.5 for Cpcla (R = 0.77). In terms of AVED, αP is overestimated in 0.14 10−3/K and κT is overestimated in 0.0465 1/GPa. As expected (Caleman et al., 2012; Horta et al., 2016), ε is poorly described due to the lack of polarization effects, resulting in a underestimation of −4.52 in the dielectric constant. On other hand, Cpcla was overestimated by 88.2 J/mol×K, a behavior aligned with recent works in literature (Caleman et al., 2012; Horta et al., 2016). Individual AVED and absolute errors can be found in Tables S4, S5 in Supplementary Material, along with experimental properties in Table S3. 3.3. Interactions in Water In order to quantitatively evaluate the behavior of heteroaromatic rings in water and their interactions with the aqueous surrounding, some properties were calculated throughout 250 ns of simulation. From these calculations, we were capable to assess the average H-bond (AverHB) of each heteroatom along with its residence time (τHB), lifetime (lifetimeHB), the free-energy of breakage of a H-bond ( GHB), and the percentage of simulation time that a given heteroatom was involved in, at least, one Hbond (Percent). We were also capable to obtain the optimal binding distance between an heteratom and water (OBDHB), along with the coordination number (CNHB) at the OBDHB and the average orientation of water molecules surrounding the heteroatom. These data are compiled in Tables 4, 5. 4. DISCUSSION 4.1. Topology Building Strategy The accurate description of organic compounds’ chemical diversity, mainly in the context of drugs and medicinal chemistry, is a challenging task in molecular mechanics since it must be described as broadly as possible by the force field fragments. However, the most common sets of MM parameters Frontiers in Pharmacology | www.frontiersin.org 7 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution FIGURE 2 | Correlation between experimental and calculated physical-chemical properties of organic liquids for 42 aromatic compounds on the calibration set. Standard deviations are shown as bars, linear regressions are shown as green and empty dots represent outliers. employed in biomolecules simulations are usually centered on the monomeric constituents of biopolymers and lipids, while parameters for synthetic compounds, as well as other common non-polymeric biological molecules (e.g., natural products), must be included from specific calculations or external sets of parameters. In this sense, a proper description of torsional terms will impact directly the dynamical behavior of these small molecules, even considering that, when evaluating ligand-receptor complexes, the influence of these terms might be mitigated due to the ligand movement restriction inside the binding pocket. Still, accommodation of flexible docking derived poses, fine tunning of induced fit, and characterization of ligands conformational induction vs. selection (with potential inferences of the entropic costs of binding) require dihedrals potentials specifically adjusted to organic compounds. Hence, new parameters were generated in this work exclusively for 15 dihedrals in aromatic rings in our calibration set (Figure 1). In general, our results revealed that our MM parameters yielded a good description of the QM torsional profile, with the exceptions of [16] tiophenol, [42] phenoxybenzene, [24] phenylmethanol, and [18] trifluoromethylbenzene. For these molecules, the distribution profile was almost evenly spread, most likely due to the low energy barrier (below 2.5 kJ/mol), indicating that transient states are commonly achieved during our simulations in SPC water model. Simulations of these particular molecules in vacuum revealed little influence of water solvation in the dihedral profile (data not shown). In another sense, the choice of an atomic charge set for ligands can drastically impact thermodynamical binding properties such as complexation free-energy and desolvation. Therefore, we employed in this work a dipole moment based strategy to describe the Coulombic contribution using physicochemical properties of organic liquids as target. The prediction power of our strategy was compared to recent comparisons of aromatic compounds in liquid phase (Caleman et al., 2012; Horta et al., 2016) and summarized in Table 3. In general, our calibration set yielded similar or lower average deviations than benchmarks made with OPLS-AA, GAFF, and 2016H66 sets for all physicochemical properties evaluated in this work. The main difference was in terms of Cpcla, for which GAFF and OPLS-AA overestimate nearly 40 J/mol×K more than our parameters. Still, all four parameters sets overestimates Cpcla. In addition, the GROMOS53A5 force field was designed to reproduce physicochemical properties, and later on adjusted to reproduce free energy of solvation and hydration (GROMOS53A6) (Oostenbrink et al., 2004). The average deviation on density, enthalpy of vaporization and freeenergy of solvation of GROMOS53A5 were 0.0389 g/cm3, −0.4 and 3.8 kJ/mol, respectively. These values are very similar to our results, as shown in Table 3, reiterating the quality of our parameters. It is important to mention that the employed benchmark set was built using the same Lennard-Jones parameters used in the benzene ring of phenylalanine in GROMOS53A6. While GROMOS53A6 produces a Ghyd = 0.0 kJ/mol for benzene Frontiers in Pharmacology | www.frontiersin.org 8 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution TABLE 4 | Properties of heteroaromatic rings in water. Average H-bonds (AverHB), H-bond residence time (τHB) is ps, H-bond lifetime (lifetimeHB) in 1/ps, free-energy of H-bond breakage ( GHB) in kJ/mol, percentage of simulation with at least one formed H-bond (Percent.), coordination number of water (CN), optimal binding distance with water (OBDHB) in nm, and overall water orientation around the heteroatom (Orientation). Molecule Atom AverHB τHB lifetimeHB GHB Percent CN OBDHB Orientation Water Ow OH1 OH2 1.73 ± 0.62 0.87 ± 0.35 0.86 ± 0.35 2.11 ± 0.02 1.80 ± 0.03 1.83 ± 0.03 0.47 ± 0.00 0.55 ± 0.01 0.54 ± 0.01 6.38 ± 0.03 5.98 ± 0.05 6.03 ± 0.04 98.58 86.25 86.07 4.11 ± 2.83 4.11 ± 2.83 4.11 ± 2.83 0.18 ± 0.00 0.18 ± 0.00 0.18 ± 0.00 Undefined O-oriented O-oriented Phenol O 1.10 ± 0.62 1.61 ± 0.03 0.62 ± 0.01 5.70 ± 0.04 85.96 1.46 ± 1.03 0.18 ± 0.00 Undefined OH 0.96 ± 0.20 9.49 ± 0.18 0.11 ± 0.00 10.11 ± 0.05 96.04 0.90 ± 0.01 0.17 ± 0.00 O-oriented Phenylmethanol O 1.42 ± 0.58 2.58 ± 0.03 0.39 ± 0.00 6.88 ± 0.02 96.51 2.68 ± 1.59 0.18 ± 0.00 Undefined OH 0.95 ± 0.24 5.37 ± 0.06 0.19 ± 0.00 8.70 ± 0.03 94.25 1.13 ± 0.01 0.17 ± 0.00 O-oriented 2-methylphenol O 1.04 ± 0.59 1.88 ± 0.04 0.53 ± 0.01 6.09 ± 0.05 84.80 1.05 ± 0.00 0.18 ± 0.00 Undefined OH 0.95 ± 0.23 9.46 ± 0.17 0.11 ± 0.00 10.10 ± 0.04 94.53 0.87 ± 0.01 0.17 ± 0.00 O-oriented 3-methylphenol O 1.08 ± 0.61 1.74 ± 0.02 0.58 ± 0.01 5.90 ± 0.03 85.83 1.43 ± 1.00 0.18 ± 0.00 Undefined OH 0.96 ± 0.19 10.12 ± 0.19 0.10 ± 0.00 10.27 ± 0.05 96.30 0.90 ± 0.01 0.17 ± 0.00 O-oriented 4-methylphenol O 1.08 ± 0.61 1.73 ± 0.02 0.58 ± 0.01 5.89 ± 0.03 85.70 1.10 ± 0.01 0.18 ± 0.00 Undefined OH 0.96 ± 0.20 10.00 ± 0.21 0.10 ± 0.00 10.24 ± 0.05 96.21 0.90 ± 0.01 0.17 ± 0.00 O-oriented Benzenethiol S 0.67 ± 0.65 0.38 ± 0.01 2.63 ± 0.05 2.13 ± 0.04 57.29 0.81 ± 0.17 0.23 ± 0.00 Undefined SH 0.77 ± 0.43 1.00 ± 0.02 1.00 ± 0.02 4.52 ± 0.05 76.38 2.08 ± 0.02 0.23 ± 0.00 O-oriented Aniline N NH1 NH2 0.93 ± 0.58 0.63 ± 0.49 0.63 ± 0.50 1.64 ± 0.02 1.15 ± 0.03 0.99 ± 0.02 0.61 ± 0.01 0.87 ± 0.02 1.01 ± 0.02 5.75 ± 0.03 4.87 ± 0.06 4.51 ± 0.04 79.89 62.48 62.05 1.01 ± 0.01 1.25 ± 0.38 1.39 ± 0.25 0.19 ± 0.00 0.22 ± 0.00 0.23 ± 0.00 Undefined O-oriented O-oriented 2-chloroaniline N NH1 NH2 Cl 0.86 ± 0.50 0.51 ± 0.51 0.56 ± 0.51 0.24 ± 0.45 2.29 ± 0.04 1.00 ± 0.03 0.87 ± 0.03 0.32 ± 0.08 0.44 ± 0.01 1.00 ± 0.03 1.15 ± 0.04 3.26 ± 0.68 6.59 ± 0.05 4.53 ± 0.06 4.18 ± 0.09 1.66 ± 0.56 79.39 50.60 55.20 22.67 0.92 ± 0.00 0.19 ± 0.00 1.33 ± 0.14 0.23 ± 0.00 3.82 ± 5.29 0.23 ± 0.01 18.94 ± 10.05 0.36 ± 0.00 Undefined O-oriented O-oriented Undefined Pyridine Pyrimidine N 1.41 ± 0.71 1.33 ± 0.02 0.75 ± 0.01 5.24 ± 0.03 91.46 1.59 ± 0.01 0.20 ± 0.00 Undefined N1 1.06 ± 0.68 0.91 ± 0.02 1.10 ± 0.02 4.30 ± 0.05 80.71 1.23 ± 0.01 0.20 ± 0.00 Undefined N2 0.98 ± 0.68 0.81 ± 0.02 1.24 ± 0.03 4.00 ± 0.06 76.96 1.17 ± 0.01 0.20 ± 0.00 Undefined 2-methylpyridine N 1.52 ± 0.70 1.74 ± 0.04 0.57 ± 0.01 5.90 ± 0.06 93.96 1.68 ± 0.00 0.20 ± 0.00 Undefined 3-methylpyridine N 1.43 ± 0.71 1.34 ± 0.04 0.74 ± 0.02 5.26 ± 0.07 91.65 1.61 ± 0.01 0.20 ± 0.00 Undefined 4-methylpyridine N 1.46 ± 0.71 1.44 ± 0.04 0.69 ± 0.02 5.44 ± 0.07 92.57 1.62 ± 0.01 0.20 ± 0.00 Undefined 2,4,6-trimethylpyridine N 0.36 ± 0.53 0.36 ± 0.04 2.79 ± 0.31 2.00 ± 0.28 33.67 24.48 ± 3.47 0.42 ± 0.09 Undefined Quinoline N 1.64 ± 0.68 2.00 ± 0.05 0.50 ± 0.01 6.25 ± 0.06 96.10 1.78 ± 0.01 0.19 ± 0.00 Undefined Isoquinoline N 1.26 ± 0.68 1.22 ± 0.04 0.82 ± 0.02 5.02 ± 0.07 88.67 1.43 ± 0.01 0.20 ± 0.00 Undefined Benzonitrile N 1.63 ± 0.72 1.30 ± 0.01 0.77 ± 0.01 5.17 ± 0.02 95.50 1.88 ± 0.01 0.19 ± 0.00 Undefined Furan O 0.42 ± 0.57 0.29 ± 0.01 3.41 ± 0.07 1.49 ± 0.05 37.99 31.54 ± 2.60 0.46 ± 0.01 Undefined Tiophene S 0.15 ± 0.37 0.25 ± 0.03 4.07 ± 0.49 1.07 ± 0.32 14.03 18.25 ± 5.37 0.37 ± 0.00 Undefined Pyrrole Fluorobenzene NH 0.92 ± 0.29 3.80 ± 0.06 0.26 ± 0.00 7.84 ± 0.04 91.73 0.38 ± 0.00 0.18 ± 0.00 O-oriented N 0.74 ± 0.67 1.33 ± 0.03 0.75 ± 0.02 5.23 ± 0.06 60.90 0.62 ± 0.01 0.23 ± 0.00 Undefined F1 0.30 ± 0.49 0.30 ± 0.03 3.35 ± 0.32 1.54 ± 0.24 27.84 13.84 ± 5.27 0.36 ± 0.01 Undefined (Continued) Frontiers in Pharmacology | www.frontiersin.org 9 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution TABLE 4 | Continued Molecule 1,2-difluorobenzene 1,3-difluorobenzene 1,2,3,4-tetrafluorobenzene 1,2,3,5-tetrafluorobenzene Trifluoromethylbenzene Atom F1 F2 F1 F3 F1 F2 F3 F4 F1 F2 F3 F5 F1 F2 F3 AverHB τHB lifetimeHB GHB Percent CN OBDHB Orientation 0.24 ± 0.45 0.33 ± 0.07 3.14 ± 0.61 1.74 ± 0.49 0.24 ± 0.45 0.34 ± 0.07 3.08 ± 0.64 1.79 ± 0.51 22.91 22.90 12.15 ± 2.45 0.37 ± 0.01 13.31 ± 3.02 0.37 ± 0.01 Undefined Undefined 0.23 ± 0.45 0.36 ± 0.10 2.91 ± 0.58 1.94 ± 0.59 0.23 ± 0.45 0.32 ± 0.04 3.20 ± 0.36 1.66 ± 0.29 22.23 22.22 14.99 ± 5.60 0.36 ± 0.01 11.70 ± 1.33 0.36 ± 0.00 Undefined Undefined 0.17 ± 0.38 0.18 ± 0.40 0.18 ± 0.40 0.16 ± 0.38 0.36 ± 0.08 0.42 ± 0.15 0.33 ± 0.04 0.43 ± 0.21 2.88 ± 0.61 2.66 ± 0.78 3.11 ± 0.45 2.72 ± 0.83 1.97 ± 0.53 2.22 ± 0.79 1.74 ± 0.34 2.20 ± 0.98 16.08 17.44 17.31 16.05 12.99 ± 3.94 12.72 ± 2.45 13.89 ± 3.36 11.25 ± 1.36 0.37 ± 0.01 0.37 ± 0.01 0.37 ± 0.01 0.36 ± 0.00 Undefined Undefined Undefined Undefined 0.17 ± 0.39 0.16 ± 0.37 0.17 ± 0.39 0.21 ± 0.43 0.37 ± 0.10 0.48 ± 0.25 0.44 ± 0.10 0.33 ± 0.06 2.84 ± 0.64 2.47 ± 0.82 2.40 ± 0.53 3.17 ± 0.54 2.01 ± 0.61 2.47 ± 1.03 2.42 ± 0.56 1.70 ± 0.44 16.51 15.19 16.59 20.59 12.34 ± 3.05 12.79 ± 3.60 14.91 ± 5.13 11.51 ± 1.36 0.36 ± 0.01 0.37 ± 0.01 0.37 ± 0.01 0.36 ± 0.01 Undefined Undefined Undefined Undefined 0.10 ± 0.30 0.10 ± 0.30 0.10 ± 0.30 1.64 ± 1.99 0.64 ± 0.37 2.82 ± 5.74 1.26 ± 0.91 1.95 ± 0.73 1.05 ± 0.41 4.66 ± 2.10 3.10 ± 1.16 5.00 ± 2.38 9.56 14.84 ± 3.37 0.39 ± 0.02 Undefined 9.66 15.37 ± 3.44 0.40 ± 0.02 Undefined 9.54 14.94 ± 3.23 0.40 ± 0.02 Undefined 1-chloronaphthalene Cl 0.37 ± 0.55 0.28 ± 0.02 3.55 ± 0.21 1.39 ± 0.15 33.96 15.53 ± 7.71 0.36 ± 0.00 Undefined 1-phenylethanone O 0.87 ± 0.66 0.56 ± 0.01 1.79 ± 0.04 3.09 ± 0.05 71.14 1.08 ± 0.01 0.19 ± 0.00 Undefined Benzaldehyde Nitrobenzene Methylbenzoate O 1.03 ± 0.66 0.78 ± 0.01 1.28 ± 0.02 3.91 ± 0.04 80.79 1.22 ± 0.01 0.18 ± 0.00 Undefined O1 0.14 ± 0.36 0.43 ± 0.13 2.51 ± 0.67 2.34 ± 0.72 13.79 13.60 ± 4.08 0.38 ± 0.01 Undefined O2 0.14 ± 0.36 0.46 ± 0.15 2.37 ± 0.62 2.48 ± 0.73 13.81 16.48 ± 5.81 0.38 ± 0.01 Undefined O1 0.95 ± 0.67 0.69 ± 0.01 1.45 ± 0.02 3.62 ± 0.04 75.85 1.13 ± 0.01 0.19 ± 0.00 Undefined O2 0.17 ± 0.38 0.28 ± 0.05 3.63 ± 0.58 1.37 ± 0.43 16.40 25.27 ± 2.00 0.45 ± 0.00 Undefined 2-hydroxy-methylbenzoate O O1 O2 OH 0.96 ± 0.58 0.94 ± 0.64 0.12 ± 0.33 0.05 ± 0.22 1.48 ± 0.02 1.07 ± 0.17 0.36 ± 0.23 0.21 ± 0.04 0.67 ± 0.01 0.96 ± 0.15 3.43 ± 1.03 4.83 ± 0.80 5.51 ± 0.03 4.65 ± 0.39 1.65 ± 1.10 0.66 ± 0.46 81.17 76.43 12.11 5.25 1.07 ± 0.00 1.07 ± 0.13 23.62 ± 8.28 0.40 ± 0.01 0.18 ± 0.00 0.18 ± 0.00 0.33 ± 0.11 0.18 ± 0.01 Undefined Undefined Undefined O-oriented Methoxybenzene 1,2-dimethoxybenzene O 0.36 ± 0.51 0.33 ± 0.02 3.03 ± 0.14 1.78 ± 0.12 34.78 0.42 ± 0.01 0.20 ± 0.00 H-oriented O1 0.39 ± 0.54 0.38 ± 0.02 2.62 ± 0.15 2.14 ± 0.14 36.51 20.52 ± 10.11 0.28 ± 0.08 H-oriented O1 0.39 ± 0.54 0.38 ± 0.05 2.64 ± 0.32 2.14 ± 0.32 36.49 12.48 ± 11.98 0.24 ± 0.00 H-oriented Phenoxybenzene O 0.32 ± 0.49 0.29 ± 0.02 3.40 ± 0.18 1.50 ± 0.13 31.12 5.69 ± 10.11 0.23 ± 0.01 Undefined Colors represent different functional groups: red for oxygen, blue for nitrogen, orange for sulfur and green for halogen containing groups. (phenylalanine side-chain), our benzene parameters yield a Ghyd = −3.4 kJ/mol, a much closer value to the experimental data ( Ghyd = −3.6 kJ/mol). Nevertheless, the AVED value reveals a underestimation for free energy of hydration in our parameter set. A possible reason is that chemical functions such as nitro, fluorine, chlorine, and aldehydic carbonyls are not commonly found in biomolecules and, therefore, the LJ parameters used in GROMOS53A6 may not be properly extrapolated to synthetic compounds. Moreover, we have tested ether oxygens LJ parameters reported in Horta et al. (2011) in our pure liquid simulations of [2]furan and [23]methoxybenzene, leading to approximately the same behavior in their respective physical-chemical properties (data not shown). 4.2. Properties in Solution: Influence of Nearby Substitutions in H-Bonds In order to access quantitative informations regarding how aromatic rings interact with their surroundings, we performed molecular dynamics simulations for 103 aromatic rings most commonly used in drug design, including our 42 molecules Frontiers in Pharmacology | www.frontiersin.org 10 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution TABLE 5 | Properties of heteroaromatic rings in water. Average H-bonds (AverHB), H-bond residence time (τHB) is ps, H-bond lifetime (lifetimeHB) in 1/ps, free-energy of H-bond breakage ( GHB) in kJ/mol, percentage of simulation with at least one formed H-bond (Percent.), coordination number of water (CN), optimal binding distance with water (OBDHB) in nm, and overall water orientation around the heteroatom (Orientation). Molecule Atom AverHB τHB lifetimeHB GHB Percent CN OBDHB Orientation Water Imidazole Thiazole Benzopyrrole Tetrazole Benzeimidazole 7,8-dihydro-1H-purine 1,2,4 - Triazole Quinazoline 1H-pyrimidin-2-one 4-quinolone Isoxazole Uracil Ow OH1 OH2 N1 N1 H N3 S1 N3 N1 N1 H N4 N3 N2 N1 N1 H N1 N1 H N3 N6 H N1 N6 N4 N1 H N3 N2 N1 N1 H N4 N1 N3 O2 N1 H N1 N3 O4 N1 N1 H O1 N2 N3 N3 H O2 O4 N1 N1 H 1.73 ± 0.62 0.87 ± 0.35 0.86 ± 0.35 2.11 ± 0.02 1.80 ± 0.03 1.83 ± 0.03 0.47 ± 0.00 0.55 ± 0.01 0.54 ± 0.01 6.38 ± 0.03 5.98 ± 0.05 6.03 ± 0.04 0.08 ± 0.27 0.56 ± 0.52 1.30 ± 0.72 0.33 ± 0.09 0.35 ± 0.00 1.01 ± 0.01 3.27 ± 0.86 2.89 ± 0.04 0.99 ± 0.00 1.68 ± 0.69 1.90 ± 0.03 4.56 ± 0.01 0.04 ± 0.20 – – – 0.53 ± 0.60 0.37 ± 0.01 2.73 ± 0.08 2.04 ± 0.07 0.09 ± 0.29 0.28 ± 0.05 3.68 ± 0.49 1.32 ± 0.37 0.63 ± 0.51 0.66 ± 0.01 1.50 ± 0.03 3.51 ± 0.04 0.87 ± 0.69 0.89 ± 0.74 0.31 ± 0.51 0.02 ± 0.14 0.00 ± 0.00 0.54 ± 0.01 0.53 ± 0.02 0.27 ± 0.02 – – 1.85 ± 0.03 1.87 ± 0.06 3.75 ± 0.22 – – 3.00 ± 0.04 2.97 ± 0.08 1.26 ± 0.15 – – 0.06 ± 0.23 0.78 ± 0.44 1.08 ± 0.71 0.21 ± 0.03 1.21 ± 0.02 0.86 ± 0.01 4.87 ± 0.67 0.82 ± 0.01 1.16 ± 0.02 0.62 ± 0.34 5.01 ± 0.03 4.16 ± 0.04 0.42 ± 0.51 0.04 ± 0.19 0.04 ± 0.19 0.33 ± 0.51 0.96 ± 0.21 1.83 ± 0.70 0.34 ± 0.01 0.20 ± 0.02 – 0.37 ± 0.04 6.49 ± 0.11 1.97 ± 0.09 2.91 ± 0.09 5.21 ± 0.66 – 2.76 ± 0.31 0.15 ± 0.00 0.51 ± 0.02 1.88 ± 0.08 0.46 ± 0.31 – 2.02 ± 0.29 9.16 ± 0.04 6.20 ± 0.11 1.50 ± 0.72 1.53 ± 0.04 0.66 ± 0.66 0.83 ± 0.02 0.98 ± 0.15 11.47 ± 0.25 0.83 ± 0.68 0.61 ± 0.01 0.65 ± 0.02 5.59 ± 0.06 1.20 ± 0.04 4.07 ± 0.07 0.09 ± 0.00 10.58 ± 0.05 1.63 ± 0.02 3.32 ± 0.04 0.64 ± 0.63 0.49 ± 0.02 2.04 ± 0.07 2.76 ± 0.09 0.43 ± 0.56 0.31 ± 0.01 3.19 ± 0.05 1.65 ± 0.04 1.20 ± 0.72 0.41 ± 0.51 0.01 ± 0.09 0.89 ± 0.62 0.79 ± 0.01 0.33 ± 0.01 – 0.84 ± 0.00 1.27 ± 0.02 3.07 ± 0.13 – 1.19 ± 0.01 3.93 ± 0.03 1.75 ± 0.10 – 4.11 ± 0.01 1.74 ± 0.74 0.01 ± 0.10 0.66 ± 0.49 1.39 ± 0.02 – 0.80 ± 0.02 0.72 ± 0.01 – 1.26 ± 0.04 5.35 ± 0.04 – 3.96 ± 0.07 0.59 ± 0.62 0.36 ± 0.00 2.82 ± 0.02 1.96 ± 0.02 0.60 ± 0.63 0.35 ± 0.01 2.86 ± 0.05 1.92 ± 0.04 0.02 ± 0.16 0.33 ± 0.49 0.39 ± 0.54 1.24 ± 0.71 0.01 ± 0.07 0.45 ± 0.52 1.99 ± 1.45 0.29 ± 0.01 0.28 ± 0.01 0.87 ± 0.01 – 0.37 ± 0.01 1.04 ± 0.90 3.47 ± 0.10 3.61 ± 0.13 1.15 ± 0.01 – 2.67 ± 0.07 5.38 ± 2.21 1.45 ± 0.07 1.35 ± 0.09 4.19 ± 0.02 – 2.09 ± 0.06 98.58 86.25 86.07 4.11 ± 2.83 0.18 ± 0.00 4.11 ± 2.83 0.18 ± 0.00 4.11 ± 2.83 0.18 ± 0.00 Undefined O-oriented O-oriented 7.58 55.10 87.72 26.01 ± 7.18 0.41 ± 0.01 34.04 ± 1.08 0.45 ± 0.02 7.56 ± 12.01 0.20 ± 0.00 Undefined Undefined Undefined 4.21 16.56 ± 5.11 0.38 ± 0.01 Undefined 47.16 0.62 ± 0.11 0.22 ± 0.00 Undefined 8.60 15.88 ± 2.99 0.38 ± 0.00 Undefined 61.63 21.05 ± 1.45 0.41 ± 0.01 Undefined 69.85 68.12 29.37 1.95 0.00 1.17 ± 0.02 1.09 ± 0.32 26.38 ± 7.22 21.88 ± 2.06 0.50 ± 0.03 0.21 ± 0.00 0.23 ± 0.00 0.41 ± 0.00 0.41 ± 0.01 0.24 ± 0.00 Undefined Undefined Undefined Undefined O-oriented 5.53 77.23 80.28 16.49 ± 2.30 0.40 ± 0.01 32.26 ± 2.16 0.46 ± 0.01 1.32 ± 0.01 0.20 ± 0.00 Undefined Undefined Undefined 41.30 3.73 3.70 31.14 95.83 97.65 30.55 ± 3.92 0.46 ± 0.01 16.89 ± 12.64 0.28 ± 0.00 28.25 ± 2.59 0.46 ± 0.01 24.55 ± 2.24 0.45 ± 0.01 0.79 ± 0.65 0.18 ± 0.00 3.38 ± 1.14 0.20 ± 0.00 Undefined O-oriented Undefined Undefined O-oriented Undefined 92.83 55.78 97.94 67.99 1.70 ± 0.00 2.10 ± 3.51 3.76 ± 0.00 0.97 ± 0.01 0.20 ± 0.00 0.22 ± 0.00 0.25 ± 0.03 0.20 ± 0.00 Undefined Undefined O-oriented Undefined 56.17 0.71 ± 0.21 0.21 ± 0.00 Undefined 39.34 24.86 ± 3.16 0.28 ± 0.08 H-oriented 84.99 39.74 0.73 75.00 1.42 ± 0.02 24.01 ± 4.58 19.00 ± 2.14 1.05 ± 0.00 0.20 ± 0.00 0.44 ± 0.02 0.41 ± 0.00 0.20 ± 0.00 Undefined Undefined Undefined Undefined 96.17 1.03 65.61 3.85 ± 1.52 0.19 ± 0.00 23.69 ± 2.01 0.47 ± 0.01 26.90 ± 2.65 0.47 ± 0.01 Undefined Undefined Undefined 52.43 52.00 7.34 ± 13.27 0.22 ± 0.01 Undefined 0.77 ± 0.04 0.24 ± 0.00 H-oriented 2.48 31.98 36.17 86.66 0.54 44.10 17.32 ± 1.01 25.64 ± 8.27 19.18 ± 8.50 4.08 ± 5.14 29.81 ± 2.46 30.23 ± 1.97 0.42 ± 0.01 0.43 ± 0.02 0.37 ± 0.01 0.20 ± 0.00 0.46 ± 0.01 0.47 ± 0.01 Undefined Undefined Undefined Undefined Undefined Undefined (Continued) Frontiers in Pharmacology | www.frontiersin.org 11 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution TABLE 5 | Continued Molecule Pyrazole Pyrazine 1,8-naphthyridin-4(1H)-one Atom AverHB τHB lifetimeHB GHB Percent CN OBDHB Orientation N1 H N1 N2 0.00 ± 0.00 0.00 ± 0.00 0.72 ± 0.66 − – 0.44 ± 0.01 − – 2.29 ± 0.07 − – 2.48 ± 0.08 0.00 0.00 60.69 18.32 ± 2.00 0.40 ± 0.01 17.76 ± 1.63 0.40 ± 0.00 0.96 ± 0.03 0.21 ± 0.00 Undefined Undefined Undefined N1 1.15 ± 0.66 1.15 ± 0.03 0.87 ± 0.02 4.88 ± 0.07 85.83 6.55 ± 10.50 0.19 ± 0.00 Undefined N4 1.15 ± 0.65 1.15 ± 0.02 0.87 ± 0.02 4.86 ± 0.05 85.87 6.66 ± 10.74 0.19 ± 0.00 Undefined O4 N8 N1 N1 H 1.13 ± 0.73 0.28 ± 0.48 0.05 ± 0.23 0.40 ± 0.51 0.71 ± 0.02 0.28 ± 0.02 0.40 ± 0.14 0.34 ± 0.02 1.41 ± 0.04 3.57 ± 0.28 2.81 ± 0.93 2.99 ± 0.20 3.68 ± 0.07 1.38 ± 0.19 2.10 ± 0.84 1.82 ± 0.17 81.51 26.76 5.34 39.32 1.44 ± 0.02 25.87 ± 2.34 22.03 ± 2.09 26.16 ± 2.29 0.20 ± 0.00 0.45 ± 0.01 0.44 ± 0.01 0.45 ± 0.01 Undefined Undefined Undefined Undefined Xanthine 1,2-dihydro-3H-1,2,4-triazol-3-one 1,3,4 - Thiadiazole Indoxazine 3,9-dihydro-6H-purin-6-one Benzofuran Indazole Benzothiophene Chromone 1,4-naphthoquinone 1,2,3 - Triazole N1 H O2 N7 N7 H N1 N3 O6 N9 N1 H N2 H N4 O3 N1 N2 S1 N3 N4 N2 O1 N1 H N1 N9 N9 H N3 O6 N7 O1 N2 N1 N1 H S1 O4 O1 O4 O1 N1 H N3 N2 0.40 ± 0.51 0.52 ± 0.60 0.02 ± 0.12 0.47 ± 0.52 0.02 ± 0.15 0.03 ± 0.17 0.46 ± 0.57 0.28 ± 0.47 0.38 ± 0.01 0.32 ± 0.03 – 0.43 ± 0.01 – – 0.31 ± 0.01 0.28 ± 0.03 2.66 ± 0.10 3.13 ± 0.23 – 2.33 ± 0.08 – – 3.25 ± 0.06 3.61 ± 0.36 2.10 ± 0.09 1.71 ± 0.19 – 2.43 ± 0.08 – – 1.61 ± 0.05 1.36 ± 0.25 39.37 46.33 1.51 46.50 2.33 3.09 42.64 27.40 28.47 ± 4.09 17.09 ± 8.14 25.71 ± 2.69 26.50 ± 2.03 21.48 ± 4.05 19.25 ± 5.54 8.82 ± 4.23 26.67 ± 2.40 0.46 ± 0.02 0.37 ± 0.01 0.45 ± 0.02 0.46 ± 0.01 0.44 ± 0.01 0.41 ± 0.01 0.33 ± 0.06 0.46 ± 0.01 Undefined Undefined Undefined Undefined Undefined Undefined Undefined Undefined 0.95 ± 0.24 0.48 ± 0.52 1.21 ± 0.67 1.26 ± 0.76 0.02 ± 0.13 0.03 ± 0.16 4.50 ± 0.09 0.37 ± 0.01 1.11 ± 0.01 0.79 ± 0.01 – – 0.22 ± 0.00 2.72 ± 0.05 0.90 ± 0.01 1.27 ± 0.01 – – 8.25 ± 0.05 2.04 ± 0.05 4.80 ± 0.03 3.93 ± 0.02 – – 94.28 46.54 87.18 85.27 1.75 2.52 1.32 ± 1.05 21.76 ± 2.73 1.39 ± 0.00 1.54 ± 0.02 17.03 ± 9.51 27.49 ± 5.89 0.17 ± 0.00 0.38 ± 0.00 0.20 ± 0.00 0.20 ± 0.00 0.28 ± 0.00 0.38 ± 0.00 O-oriented Undefined Undefined Undefined O-oriented Undefined 0.02 ± 0.15 1.33 ± 0.73 1.34 ± 0.73 – 1.17 ± 0.03 1.16 ± 0.01 – 0.86 ± 0.02 0.86 ± 0.01 – 4.91 ± 0.05 4.89 ± 0.02 2.32 88.26 88.35 19.74 ± 5.86 0.39 ± 0.01 18.35 ± 13.63 0.21 ± 0.00 1.70 ± 0.02 0.21 ± 0.00 Undefined Undefined Undefined 0.69 ± 0.65 0.45 ± 0.00 2.20 ± 0.02 2.57 ± 0.02 0.74 ± 0.66 0.48 ± 0.01 2.08 ± 0.05 2.72 ± 0.06 59.11 62.08 0.94 ± 0.02 0.22 ± 0.00 Undefined 0.84 ± 0.21 0.21 ± 0.00 Undefined 0.46 ± 0.52 0.02 ± 0.14 0.03 ± 0.17 0.47 ± 0.52 0.11 ± 0.32 1.35 ± 0.77 0.57 ± 0.61 0.40 ± 0.01 – 0.71 ± 0.45 0.36 ± 0.00 0.47 ± 0.19 0.82 ± 0.03 0.44 ± 0.02 2.48 ± 0.09 – 1.99 ± 1.06 2.80 ± 0.03 2.40 ± 0.71 1.21 ± 0.04 2.28 ± 0.11 2.28 ± 0.09 – 3.23 ± 1.48 1.97 ± 0.03 2.49 ± 0.88 4.05 ± 0.08 2.49 ± 0.12 45.20 1.93 3.05 45.60 11.22 87.77 50.12 26.74 ± 2.00 22.39 ± 5.57 20.26 ± 3.47 26.51 ± 3.63 27.05 ± 1.99 1.68 ± 0.02 0.65 ± 0.15 0.45 ± 0.00 0.42 ± 0.01 0.42 ± 0.01 0.44 ± 0.02 0.46 ± 0.01 0.20 ± 0.00 0.23 ± 0.00 Undefined Undefined Undefined Undefined Undefined Undefined Undefined 0.50 ± 0.59 0.35 ± 0.01 2.90 ± 0.10 1.89 ± 0.09 44.67 23.40 ± 11.48 0.32 ± 0.11 Undefined 0.40 ± 0.55 0.17 ± 0.39 0.55 ± 0.52 0.29 ± 0.02 0.22 ± 0.02 0.45 ± 0.00 3.45 ± 0.20 4.62 ± 0.40 2.22 ± 0.02 1.46 ± 0.14 0.75 ± 0.22 2.55 ± 0.02 36.29 16.72 53.47 22.45 ± 4.12 0.42 ± 0.01 16.16 ± 2.36 0.39 ± 0.01 18.47 ± 4.54 0.40 ± 0.01 Undefined Undefined Undefined 0.14 ± 0.35 0.36 ± 0.09 2.99 ± 0.83 1.91 ± 0.67 13.13 17.61 ± 6.64 0.37 ± 0.00 Undefined 1.17 ± 0.73 0.74 ± 0.01 1.35 ± 0.02 3.78 ± 0.04 0.14 ± 0.35 0.39 ± 0.12 2.81 ± 0.70 2.06 ± 0.71 83.02 1.46 ± 0.01 0.20 ± 0.00 Undefined 13.72 25.00 ± 1.49 0.46 ± 0.01 Undefined 0.64 ± 0.61 0.44 ± 0.01 2.27 ± 0.03 2.49 ± 0.03 0.64 ± 0.61 0.44 ± 0.01 2.25 ± 0.08 2.52 ± 0.08 56.82 57.12 0.83 ± 0.01 0.20 ± 0.00 Undefined 0.82 ± 0.02 0.20 ± 0.00 Undefined 0.82 ± 0.41 1.11 ± 0.73 1.09 ± 0.74 1.16 ± 0.02 0.78 ± 0.01 0.75 ± 0.01 0.86 ± 0.01 1.29 ± 0.02 1.34 ± 0.02 4.89 ± 0.04 3.90 ± 0.03 3.81 ± 0.03 80.97 80.38 78.90 14.14 ± 17.25 0.24 ± 0.10 1.47 ± 0.02 0.21 ± 0.00 1.47 ± 0.00 0.21 ± 0.00 O-oriented Undefined Undefined Frontiers in Pharmacology | www.frontiersin.org 12 (Continued) April 2018 | Volume 9 | Article 395 Polêto et al. TABLE 5 | Continued Molecule Pyridazine Triazine Quinoxaline Oxazole Isothiazole 1,3,4 - Oxadiazole 1,2,5 - Oxadiazole 1,2,4 - Oxadiazole 9H-purine 1,3-Thiazol-2-amine Cytosine Adenine Aromatic Rings Interactions in Aqueous Solution Atom AverHB τHB lifetimeHB GHB Percent CN OBDHB Orientation N1 0.11 ± 0.32 0.21 ± 0.00 4.73 ± 0.09 0.67 ± 0.05 11.02 16.72 ± 8.74 0.39 ± 0.03 Undefined N1 1.42 ± 0.76 1.25 ± 0.01 0.80 ± 0.01 5.09 ± 0.02 89.58 1.83 ± 0.00 0.21 ± 0.00 Undefined N2 1.41 ± 0.76 1.24 ± 0.03 0.81 ± 0.02 5.05 ± 0.07 89.42 1.83 ± 0.00 0.21 ± 0.00 Undefined N1 0.28 ± 0.48 0.29 ± 0.03 3.46 ± 0.38 1.47 ± 0.27 26.10 29.60 ± 0.96 0.45 ± 0.01 Undefined N3 0.27 ± 0.48 0.29 ± 0.03 3.48 ± 0.35 1.45 ± 0.25 25.94 29.63 ± 1.97 0.46 ± 0.00 Undefined N5 0.27 ± 0.48 0.28 ± 0.02 3.62 ± 0.24 1.34 ± 0.16 25.78 33.15 ± 1.59 0.45 ± 0.01 Undefined N4 0.34 ± 0.51 0.30 ± 0.02 3.40 ± 0.19 1.50 ± 0.14 32.01 24.66 ± 0.64 0.37 ± 0.10 Undefined N1 0.34 ± 0.51 0.30 ± 0.02 3.34 ± 0.19 1.54 ± 0.14 31.83 25.96 ± 2.58 0.33 ± 0.11 Undefined O1 0.34 ± 0.51 0.29 ± 0.01 3.51 ± 0.16 1.42 ± 0.12 31.80 31.43 ± 3.12 0.41 ± 0.10 Undefined N3 0.42 ± 0.56 0.29 ± 0.01 3.48 ± 0.14 1.44 ± 0.10 38.27 33.46 ± 2.32 0.43 ± 0.09 Undefined S1 0.05 ± 0.22 0.70 ± 0.11 1.45 ± 0.20 3.63 ± 0.36 5.11 13.50 ± 1.67 0.37 ± 0.00 Undefined N2 0.30 ± 0.49 0.29 ± 0.02 3.42 ± 0.21 1.48 ± 0.15 28.32 29.65 ± 2.44 0.46 ± 0.01 Undefined N3 0.96 ± 0.70 0.65 ± 0.02 1.53 ± 0.04 3.47 ± 0.07 74.33 1.32 ± 0.00 0.22 ± 0.00 Undefined N4 0.96 ± 0.70 0.64 ± 0.01 1.55 ± 0.02 3.44 ± 0.04 73.98 1.31 ± 0.01 0.21 ± 0.00 Undefined O1 0.09 ± 0.29 0.71 ± 0.42 1.88 ± 0.84 3.29 ± 1.36 8.69 29.27 ± 5.26 0.44 ± 0.02 Undefined O1 0.64 ± 0.66 0.36 ± 0.01 2.76 ± 0.06 2.01 ± 0.06 53.93 12.92 ± 14.89 0.28 ± 0.09 H-oriented N2 0.35 ± 0.53 0.28 ± 0.02 3.59 ± 0.20 1.37 ± 0.14 32.97 27.66 ± 1.31 0.43 ± 0.01 Undefined N5 0.36 ± 0.53 0.28 ± 0.01 3.54 ± 0.14 1.40 ± 0.10 33.18 29.19 ± 3.89 0.44 ± 0.01 Undefined N2 0.57 ± 0.61 0.35 ± 0.01 2.83 ± 0.04 1.95 ± 0.04 50.92 7.40 ± 13.39 0.23 ± 0.01 Undefined N4 0.57 ± 0.57 0.43 ± 0.01 2.30 ± 0.07 2.46 ± 0.07 52.94 0.69 ± 0.01 0.20 ± 0.00 Undefined O1 0.53 ± 0.59 0.34 ± 0.01 2.98 ± 0.05 1.82 ± 0.04 47.71 15.71 ± 18.47 0.33 ± 0.12 Undefined N3 N9 H N1 N9 N7 0.14 ± 0.35 0.93 ± 0.28 0.89 ± 0.64 0.06 ± 0.25 0.43 ± 0.56 0.62 ± 0.38 3.72 ± 0.10 0.82 ± 0.01 0.23 ± 0.03 0.34 ± 0.03 2.12 ± 0.88 0.27 ± 0.01 1.22 ± 0.02 4.47 ± 0.62 2.93 ± 0.23 2.95 ± 1.29 7.79 ± 0.07 4.03 ± 0.04 0.84 ± 0.35 1.87 ± 0.20 13.39 92.10 73.61 6.42 39.57 22.11 ± 1.98 0.44 ± 0.00 1.05 ± 0.01 3.15 ± 3.45 33.93 ± 0.52 0.42 ± 0.01 0.18 ± 0.00 0.20 ± 0.00 0.29 ± 0.00 0.29 ± 0.10 Undefined O-oriented Undefined O-oriented H-oriented N3 S1 N NH1 NH2 0.33 ± 0.51 0.04 ± 0.19 0.77 ± 0.53 0.74 ± 0.46 0.73 ± 0.46 0.32 ± 0.03 – 1.33 ± 0.02 1.29 ± 0.01 1.15 ± 0.01 3.12 ± 0.28 – 0.75 ± 0.01 0.78 ± 0.01 0.87 ± 0.01 1.72 ± 0.22 – 5.24 ± 0.04 5.16 ± 0.03 4.88 ± 0.03 30.55 3.62 72 72.89 72.43 nan ± nan 27.86 ± 3.96 0.81 ± 0.00 1.35 ± 0.38 1.20 ± 0.44 0.48 ± 0.01 0.45 ± 0.02 0.19 ± 0.00 0.21 ± 0.01 0.21 ± 0.01 Undefined Undefined Undefined O-oriented O-oriented N1 N1 H N NH1 NH2 O1 N3 0.09 ± 0.29 0.32 ± 0.48 0.88 ± 0.53 0.73 ± 0.46 0.70 ± 0.47 1.20 ± 0.79 0.93 ± 0.73 0.36 ± 0.14 0.29 ± 0.01 2.19 ± 0.03 1.81 ± 0.02 1.37 ± 0.02 0.69 ± 0.02 0.70 ± 0.05 3.07 ± 0.78 3.41 ± 0.06 0.46 ± 0.01 0.55 ± 0.01 0.73 ± 0.01 1.44 ± 0.04 1.44 ± 0.10 1.86 ± 0.81 1.48 ± 0.04 6.47 ± 0.04 6.00 ± 0.02 5.30 ± 0.04 3.62 ± 0.07 3.63 ± 0.18 8.45 31.74 79.3 72.76 69.18 81.29 71.11 25.84 ± 1.36 29.19 ± 2.52 8.19 ± 5.54 1.06 ± 0.48 0.87 ± 0.39 1.40 ± 0.39 1.34 ± 0.32 0.44 ± 0.01 0.45 ± 0.00 0.19 ± 0.00 0.21 ± 0.01 0.20 ± 0.00 0.21 ± 0.00 0.23 ± 0.00 Undefined Undefined Undefined O-oriented O-oriented Undefined Undefined N3 N9 N9 H N7 N1 N NH1 1.98 ± 0.68 0.18 ± 0.40 0.30 ± 0.47 0.17 ± 0.39 0.14 ± 0.36 0.92 ± 0.48 0.68 ± 0.48 3.01 ± 0.10 0.31 ± 0.02 0.35 ± 0.01 0.30 ± 0.02 0.80 ± 0.49 3.15 ± 0.08 1.85 ± 0.04 0.33 ± 0.01 3.26 ± 0.27 2.89 ± 0.11 3.33 ± 0.23 1.96 ± 1.26 0.32 ± 0.01 0.54 ± 0.01 7.26 ± 0.08 1.60 ± 0.20 1.90 ± 0.09 1.55 ± 0.18 3.43 ± 1.72 7.37 ± 0.06 6.05 ± 0.05 98.70 16.99 29.54 16.86 13.30 84.23 67.41 3.96 ± 1.48 25.32 ± 2.77 23.64 ± 1.66 29.43 ± 2.90 21.59 ± 5.25 4.04 ± 6.16 0.66 ± 0.06 0.19 ± 0.00 0.35 ± 0.00 0.34 ± 0.00 0.45 ± 0.01 0.39 ± 0.00 0.19 ± 0.00 0.20 ± 0.00 Undefined Undefined Undefined Undefined Undefined Undefined O-oriented (Continued) Frontiers in Pharmacology | www.frontiersin.org 13 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution TABLE 5 | Continued Molecule 5-methylindole 3-methyl-1H-indole Paraxanthine Theophylline Theobromine 2H-tetrazol-5-thiol 3-methylisoxazole 5-methylisoxazole Methylimidazole 2-Methylimidazole Guanine Atom AverHB τHB lifetimeHB GHB Percent CN OBDHB Orientation NH2 N1 N1 H 0.70 ± 0.47 0.26 ± 0.47 0.89 ± 0.33 1.50 ± 0.02 0.39 ± 0.02 3.45 ± 0.04 0.67 ± 0.01 2.56 ± 0.11 0.29 ± 0.00 5.53 ± 0.04 2.20 ± 0.11 7.60 ± 0.03 68.92 24.48 88.55 0.67 ± 0.03 0.20 ± 0.00 12.49 ± 0.43 0.34 ± 0.05 0.37 ± 0.00 0.19 ± 0.00 O-oriented Undefined O-oriented N1 N1 H 0.18 ± 0.40 0.35 ± 0.05 2.94 ± 0.40 1.88 ± 0.34 0.71 ± 0.48 1.07 ± 0.01 0.94 ± 0.01 4.69 ± 0.03 17.01 17.42 ± 2.94 0.38 ± 0.00 Undefined 70.01 17.37 ± 2.17 0.39 ± 0.01 Undefined O6 N3 N3 H O2 N9 N7 N1 0.58 ± 0.58 0.01 ± 0.10 0.54 ± 0.52 0.61 ± 0.61 0.79 ± 0.60 0.00 ± 0.05 0.03 ± 0.16 0.48 ± 0.01 – 0.65 ± 0.02 0.45 ± 0.01 0.86 ± 0.01 – – 2.08 ± 0.05 – 1.55 ± 0.06 2.22 ± 0.05 1.17 ± 0.02 – – 2.72 ± 0.06 – 3.45 ± 0.09 2.55 ± 0.06 4.15 ± 0.04 – – 52.88 1.02 52.58 54.13 69.54 0.21 2.66 0.76 ± 0.02 18.20 ± 4.10 18.43 ± 2.43 0.78 ± 0.02 0.95 ± 0.01 28.11 ± 1.53 24.93 ± 0.00 0.21 ± 0.00 0.39 ± 0.01 0.40 ± 0.01 0.21 ± 0.00 0.20 ± 0.00 0.47 ± 0.00 0.48 ± 0.00 Undefined Undefined Undefined Undefined Undefined Undefined Undefined N7 H O6 N3 O2 N9 N7 N1 0.33 ± 0.49 0.30 ± 0.48 0.02 ± 0.15 0.60 ± 0.62 0.16 ± 0.38 0.02 ± 0.13 0.01 ± 0.12 0.33 ± 0.01 0.27 ± 0.01 – 0.40 ± 0.01 0.29 ± 0.03 – – 3.00 ± 0.11 3.73 ± 0.16 – 2.53 ± 0.09 3.44 ± 0.33 – – 1.80 ± 0.09 1.27 ± 0.11 − 2.23 ± 0.09 1.48 ± 0.24 – – 32.06 28.37 2.41 53.03 15.73 1.78 1.48 23.41 ± 1.68 13.20 ± 1.45 20.89 ± 2.00 0.63 ± 0.21 27.76 ± 1.85 23.16 ± 3.82 25.52 ± 2.34 0.44 ± 0.01 0.38 ± 0.01 0.45 ± 0.01 0.22 ± 0.00 0.46 ± 0.01 0.43 ± 0.01 0.48 ± 0.01 Undefined Undefined Undefined Undefined Undefined Undefined Undefined O6 N3 O2 N9 N7 N1 H N1 0.26 ± 0.46 0.00 ± 0.06 0.97 ± 0.68 0.10 ± 0.30 0.01 ± 0.10 0.00 ± 0.00 0.03 ± 0.18 0.26 ± 0.01 – 0.69 ± 0.01 0.28 ± 0.02 – – 2.28 ± 1.56 3.89 ± 0.10 – 1.46 ± 0.01 3.54 ± 0.27 – – 0.99 ± 0.92 1.16 ± 0.06 – 3.59 ± 0.02 1.40 ± 0.19 – – 5.67 ± 2.35 25.11 0.33 76.39 9.65 1.01 0.00 3.16 12.25 ± 1.29 27.56 ± 1.00 1.22 ± 0.00 17.11 ± 2.33 25.82 ± 1.01 20.86 ± 1.70 18.46 ± 3.28 0.37 ± 0.01 0.48 ± 0.01 0.20 ± 0.00 0.42 ± 0.01 0.47 ± 0.01 0.41 ± 0.02 0.40 ± 0.00 Undefined Undefined Undefined Undefined Undefined Undefined Undefined N1 H S SH N3 N2 N1 N4 0.66 ± 0.50 0.08 ± 0.28 0.65 ± 0.59 1.05 ± 0.75 0.47 ± 0.58 0.01 ± 0.09 0.54 ± 0.61 0.56 ± 0.01 1.98 ± 2.59 0.36 ± 0.01 0.67 ± 0.01 0.34 ± 0.00 – 0.37 ± 0.01 1.79 ± 0.04 1.55 ± 1.03 2.75 ± 0.08 1.50 ± 0.03 2.93 ± 0.03 – 2.68 ± 0.10 3.08 ± 0.06 4.47 ± 2.73 2.02 ± 0.07 3.52 ± 0.06 1.86 ± 0.03 – 2.09 ± 0.09 65.29 8.34 59.18 76.69 42.55 0.90 47.48 24.63 ± 6.12 0.43 ± 0.01 24.11 ± 12.41 0.35 ± 0.00 15.22 ± 7.54 0.36 ± 0.00 1.44 ± 0.02 0.21 ± 0.00 21.28 ± 4.89 0.40 ± 0.01 16.82 ± 4.27 0.39 ± 0.02 31.80 ± 0.00 0.33 ± 0.11 Undefined Undefined Undefined Undefined Undefined Undefined Undefined O1 0.87 ± 0.70 0.59 ± 0.01 1.71 ± 0.02 3.20 ± 0.03 69.34 0.99 ± 0.25 0.21 ± 0.00 Undefined N2 0.94 ± 0.72 0.62 ± 0.01 1.61 ± 0.02 3.35 ± 0.03 72.36 1.32 ± 0.02 0.22 ± 0.00 Undefined O1 1.06 ± 0.71 0.79 ± 0.02 1.27 ± 0.03 3.95 ± 0.06 78.70 1.31 ± 0.02 0.20 ± 0.00 Undefined N2 1.03 ± 0.73 0.73 ± 0.01 1.37 ± 0.02 3.75 ± 0.03 76.61 1.42 ± 0.01 0.22 ± 0.00 Undefined N3 1.51 ± 0.68 1.51 ± 0.03 0.66 ± 0.01 5.55 ± 0.05 94.20 11.94 ± 12.59 0.19 ± 0.00 Undefined N1 0.03 ± 0.18 0.45 ± 0.28 2.86 ± 1.15 2.20 ± 1.28 3.26 29.76 ± 1.84 0.46 ± 0.01 Undefined N3 N1 N1 H 1.76 ± 0.68 0.11 ± 0.32 0.87 ± 0.36 2.28 ± 0.04 0.23 ± 0.02 1.86 ± 0.02 0.44 ± 0.01 4.45 ± 0.34 0.54 ± 0.01 6.57 ± 0.05 0.83 ± 0.19 6.06 ± 0.03 97.30 10.86 86.05 3.63 ± 0.94 0.19 ± 0.00 15.18 ± 1.88 0.40 ± 0.01 0.35 ± 0.01 0.19 ± 0.00 Undefined Undefined O-oriented N1 N1 H N7 N3 N NH1 NH2 0.00 ± 0.06 – 0.98 ± 0.15 11.66 ± 0.29 0.98 ± 0.65 0.81 ± 0.01 1.51 ± 0.64 2.33 ± 0.08 0.58 ± 0.57 1.16 ± 0.03 0.71 ± 0.47 2.24 ± 0.03 0.67 ± 0.48 1.62 ± 0.02 –– 0.09 ± 0.00 10.62 ± 0.06 1.24 ± 0.02 3.99 ± 0.03 0.43 ± 0.02 6.62 ± 0.09 0.86 ± 0.02 4.90 ± 0.06 0.45 ± 0.01 6.53 ± 0.03 0.62 ± 0.01 5.72 ± 0.03 0.40 97.86 78.40 95.37 54.33 70.16 66.50 6.42 ± 4.70 2.00 ± 0.29 1.19 ± 0.01 2.98 ± 1.13 0.27 ± 0.02 29.15 ± 1.28 24.74 ± 7.45 0.27 ± 0.00 0.17 ± 0.00 0.20 ± 0.00 0.19 ± 0.00 0.19 ± 0.00 0.35 ± 0.00 0.34 ± 0.00 O-oriented O-oriented Undefined Undefined Undefined Undefined Undefined (Continued) Frontiers in Pharmacology | www.frontiersin.org 14 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution TABLE 5 | Continued Molecule Atom AverHB τHB lifetimeHB GHB Percent CN OBDHB Orientation O6 N9 N9 H 1.91 ± 0.71 2.01 ± 0.02 0.01 ± 0.09 – 0.97 ± 0.19 11.16 ± 0.09 0.50 ± 0.01 6.26 ± 0.03 –– 0.09 ± 0.00 10.51 ± 0.02 98.32 0.76 96.44 3.73 ± 1.35 0.23 ± 0.05 3.81 ± 3.08 0.28 ± 0.00 4.90 ± 5.94 0.17 ± 0.00 Undefined O-oriented O-oriented 1-Methylindole N1 0.22 ± 0.44 0.65 ± 0.20 1.80 ± 0.84 3.29 ± 0.96 20.67 26.89 ± 0.73 0.47 ± 0.00 Undefined Chlorobenzene Cl1 0.22 ± 0.44 0.34 ± 0.08 3.08 ± 0.70 1.80 ± 0.58 20.83 31.47 ± 1.09 0.36 ± 0.00 Undefined 1,2-dichlorobenzene Cl1 0.17 ± 0.39 0.38 ± 0.05 2.70 ± 0.33 2.09 ± 0.32 16.31 19.59 ± 8.61 0.36 ± 0.00 Undefined Cl2 0.17 ± 0.39 0.39 ± 0.05 2.64 ± 0.38 2.14 ± 0.34 16.29 18.60 ± 8.25 0.36 ± 0.00 Undefined 1,3-dichlorobenzene Cl1 0.17 ± 0.40 0.38 ± 0.08 2.78 ± 0.63 2.06 ± 0.56 16.74 30.86 ± 3.91 0.36 ± 0.00 Undefined Cl3 0.17 ± 0.39 0.37 ± 0.06 2.79 ± 0.40 2.01 ± 0.37 16.55 26.58 ± 8.67 0.36 ± 0.00 Undefined 1,2,3,4-tetrachlorobenzene Cl4 0.13 ± 0.35 0.39 ± 0.12 2.75 ± 0.67 2.11 ± 0.69 12.90 25.88 ± 5.26 0.36 ± 0.00 Undefined Cl1 0.13 ± 0.35 0.43 ± 0.14 2.52 ± 0.58 2.32 ± 0.69 13.06 23.23 ± 6.96 0.37 ± 0.00 Undefined Cl2 0.11 ± 0.32 0.43 ± 0.12 2.47 ± 0.63 2.37 ± 0.66 10.45 22.80 ± 7.24 0.37 ± 0.00 Undefined Cl3 0.11 ± 0.31 0.64 ± 0.32 1.93 ± 0.80 3.14 ± 1.13 10.32 22.24 ± 7.06 0.36 ± 0.00 Undefined 1,2,3,5-tetrachlorobenzene Cl5 0.16 ± 0.38 0.29 ± 0.06 3.55 ± 0.69 1.44 ± 0.49 15.90 27.66 ± 8.74 0.36 ± 0.00 Undefined Cl1 0.14 ± 0.36 0.40 ± 0.11 2.68 ± 0.57 2.15 ± 0.60 14.00 23.15 ± 9.67 0.36 ± 0.00 Undefined Cl2 0.11 ± 0.32 0.48 ± 0.25 2.51 ± 0.92 2.46 ± 1.09 10.97 22.95 ± 7.34 0.37 ± 0.00 Undefined Cl3 0.14 ± 0.36 0.79 ± 0.71 1.98 ± 0.84 3.24 ± 1.66 13.90 23.20 ± 8.70 0.36 ± 0.00 Undefined 2-pyridone O2 N1 N1 H 1.55 ± 0.75 0.07 ± 0.27 0.78 ± 0.43 1.11 ± 0.02 0.24 ± 0.02 1.40 ± 0.02 0.90 ± 0.01 4.21 ± 0.40 0.71 ± 0.01 4.79 ± 0.04 0.98 ± 0.24 5.36 ± 0.03 93.28 7.37 77.75 1.82 ± 0.00 0.19 ± 0.00 19.48 ± 4.86 0.43 ± 0.02 26.08 ± 3.66 0.44 ± 0.01 Undefined Undefined Undefined 1,3,5-triazin-2(1H)-one N3 N5 N1 N1 H O2 1.09 ± 0.70 0.11 ± 0.32 0.03 ± 0.17 0.61 ± 0.51 0.61 ± 0.66 1.00 ± 0.03 1.01 ± 0.03 0.38 ± 0.20 3.10 ± 0.99 6.76 ± 12.20 1.62 ± 1.41 0.55 ± 0.02 1.81 ± 0.06 0.41 ± 0.02 2.45 ± 0.09 4.52 ± 0.08 1.91 ± 1.04 5.18 ± 4.11 3.06 ± 0.08 2.31 ± 0.09 80.87 10.86 3.06 59.92 51.10 1.35 ± 0.00 26.48 ± 3.24 25.39 ± 7.62 30.73 ± 1.51 28.96 ± 4.10 0.20 ± 0.00 0.45 ± 0.01 0.43 ± 0.02 0.46 ± 0.02 0.35 ± 0.00 Undefined Undefined Undefined Undefined Undefined Phenoxazine O5 N10 H N10 0.68 ± 0.65 0.64 ± 0.50 0.14 ± 0.36 0.45 ± 0.01 1.10 ± 0.03 0.20 ± 0.01 2.23 ± 0.04 0.91 ± 0.02 4.98 ± 0.25 2.54 ± 0.04 4.76 ± 0.06 0.55 ± 0.13 58.43 62.98 13.62 0.83 ± 0.01 0.21 ± 0.00 23.69 ± 5.31 0.45 ± 0.02 14.89 ± 2.51 0.40 ± 0.01 Undefined Undefined Undefined 7H-purine N1 N7 H N3 N9 N7 0.40 ± 0.55 0.48 ± 0.52 0.53 ± 0.61 0.42 ± 0.56 0.02 ± 0.15 0.31 ± 0.01 0.35 ± 0.01 0.37 ± 0.02 0.32 ± 0.01 1.28 ± 0.78 3.18 ± 0.09 2.82 ± 0.05 2.68 ± 0.12 3.13 ± 0.08 1.34 ± 1.03 1.66 ± 0.07 1.96 ± 0.05 2.09 ± 0.11 1.70 ± 0.06 4.52 ± 1.88 37.45 47.21 46.52 38.40 2.34 30.48 ± 3.09 30.09 ± 1.79 28.45 ± 4.00 29.46 ± 1.47 22.55 ± 4.08 0.32 ± 0.10 0.46 ± 0.01 0.45 ± 0.02 0.41 ± 0.08 0.43 ± 0.01 Undefined Undefined Undefined Undefined Undefined 1,4-benzodioxine O4 0.49 ± 0.57 0.39 ± 0.01 2.58 ± 0.07 2.18 ± 0.07 45.03 0.50 ± 0.13 0.21 ± 0.00 Undefined O1 0.49 ± 0.57 0.39 ± 0.01 2.58 ± 0.06 2.18 ± 0.06 44.98 0.57 ± 0.02 0.21 ± 0.00 Undefined Colors represent different functional groups: red for oxygen, blue for nitrogen, orange for sulfur and green for halogen containing groups. calibration set. These information are condensed in the Tables 4, 5. Simulations were carried for 250 ns to properly sample multiple events of H-bond breakages and solvation shell rearrangements. Our results reveal non-obvious information about the Hbond availability and strength, as in the case of [5]pyridine/ [6]pyrimidine/[56]pyrazine/[70]pyridazine/[71]triazine series (Figure 3). While exchanging a pyridine by a pyrimidine ring might lead to apparent gain of a H-bond acceptor, nitrogens of pyrimidine present a GHB of nearly 1 kJ/mol lower than pyridine. Moreover, the Percent of time with at least one formed H-bond between water and pyridine nitrogen is higher than the ones in pyrimidine. When comparing pyridine with pyrazine (an addition of another N in para), H-bonds are very similar, so as the second and third solvation layers. Also, acceptance capacity in pyrimidine ring is very similar to triazine, where all three nitrogens are located in meta. Intriguingly, values for pyridine are very similar to the ones calculated for pyridazine, with a slight increase in OBDHB and a more compact second layer of solvation, as shown in Figure 3A. These results suggest that Frontiers in Pharmacology | www.frontiersin.org 15 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution FIGURE 3 | (A) Methyl substituitions: 2-Me (green), 3-Me (yellow), (B) Nearby N substitution: Northo (green), Nmeta (yellow), 4-Me (purple) and 2,4,6-Me (pink). Npara (purple). Solvation properties of aromatic rings in pyridine family. Radial distribution functions (RDFs) and H-bonding strength of N1 (blue) are affected by substitutions in ortho, meta, and para. another nitrogen acceptor in meta decreases nitrogen acceptance capacity, while another nitrogen acceptor in ortho has low effect in H-bond capacity, but a considerable effect in the solvation layers structures. In this sense, these features can impact the binding inside receptors. Pyridazine, for example, has a larger OBDHB than pyridine, suggesting that these molecules can occupy the binding pocket in a different manner, impacting the entropic cost of binding. Other cases have been equally surprising, like the [39]quinoline/[40]isoquinoline. The main difference between them is the location of the acceptor nitrogen (closer to C8 in the quinoline fused ring). Counterintuitively, the AverHB of isoquinoline is slightly lower than for quinoline, such as the τHB, and the GHB is almost 1.25 kJ/mol lower. The same properties for pyridine ring are somewhat between these values of quinoline and isoquinoline. In addition, GHB for [51]quinazoline and [72]quinoxaline rings are almost 3 kJ/mol lower than quinoline and isoquinoline. In this sense, quinazoline and quinoxaline would be better candidates in fragment-based drug design due to the lower energetic cost of desolvation, while maintaining the H-bond capacity inside the receptor. Another case in terms of aromatic nitrogen hydrogen bond acceptor is the [37]2,4,6-trimethylpyridine (Figure 3B). The presence of methyl groups in both ortho positions drastically reduces the availability of H-bonds, as shown in Figure 3, and diminish the residence time of the accepted H-bond. But the presence of only one methyl group in ortho appears to have a modest effect, slightly favoring the presence of H-bond in Frontiers in Pharmacology | www.frontiersin.org 16 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution nitrogen of [19]2-methylpyridine. Moreover, the second and third solvation layers of 2- and 2,4,6-trimethylpyridine are dismantled, while the same behavior is not observed for [20]3and [21]4-methylpyridine. Other non-obvious events can be observed regarding Hbond donation in hydroxyls groups. In case of [12]phenol, the necessary energy to break a donated H-bond (∼10 kJ/mol) is almost the double to break an accepted one (∼5.70 kJ/mol), in alignment with the QM data reported by Parthasarath et al. (2005) in HF, MP2, and DFT level. And while phenol and [24]phenylmethanol might appear interchangeable during the lead optimization process, the GHB of accepted and donated H-bonds in the hydroxyl group is almost 1 kJ/mol higher for phenylmethanol. While targeting thermodynamics of binding during drug design, these energy costs of desolvation can play a crucial role. As expected, benzenethiol was revealed to be a poor acceptor of hydrogen bonds in our simulations, but a reasonable H-bond donator. In terms of vicinity effects, methylation in ortho seems to have little effect on hydroxyl groups, since the properties evaluated for the series [12]phenol/[25]2-methylphenol/[26]3methylphenol/[27]4-methylphenol have very similar behavior. It is well– know that halogens are widely used for drug design, and the role of halogen bonds (X-bonds) and H-bonds role have been investigated thoroughly (Rendine et al., 2011; Ford and Ho, 2016; Lin and Mackerell, 2017). In general, the H-bonding strength decreases with the halogen radius (F > Cl > Br > I), while the halogen bond strength increases (Rendine et al., 2011). In this work, we investigated how fluorine and chlorine behave as H-bond acceptors in water. In the case of [7]fluorobenzene, the GHB = 1.54 ± 0.24 is in accordance with a weak Hbond (Domagała et al., 2017). The other fluorinated rings in the series (1,2-, 1,3-, 1,2,3,4-, and 1,2,3,5-tetrafluorobenzene [8-11]) have similar values, varying from 1.5 to 2.2 kJ/mol. Regarding the chlorinated rings series (chlorobenzene, 1,2-, 1,3-, 1,2,3,4-, and 1,2,3,5-tetrachlorobenzene [94–98]), GHB ranged from 1.80 to 3.24 kJ/mol, contradicting the expected behavior. X-bonding are often poorly described in MM, since it treats atoms as a sphere with isoelectric surface and thus not describing the necessary positive potential required for such interaction. In fact, we have visually evaluated that waters surrounding fluorine and chlorine have their hydrogens oriented toward the halogens, confirming our measure of H-bonds and not X-bonds. Regarding oxygen atoms within the aromatic ring, AverHB are generally lower than expected. It is well known that oxygens in heterocycles act as H-bond acceptor (Kaur and Khanna, 2011), but our model does not reproduce this tendency. It is important to notice that GROMOS53A6 does not have specific parameters for oxygens within aromatic rings, and LJ parameters from ethers were employed. Not surprisingly, the calculated properties for the oxygen atom in furan and benzofuran are very similar to methoxybenzene and phenoxybenzene. This result suggests that the description of the properties in aqueous solutions of aromatic rings containing oxygen might be improved by specific LJ parameters. Moreover, we have tested ether LJ parameters reported in Horta et al. (2011) for our simulations of furan and methoxybenzene in water, yielding lower AverHB and GHB (data not shown). The new force field parameters developed in this work can be obtained upon request. 4.3. Impacts in Drug Design Recently, several authors have questioned the LE approach as optimization tool and its actual power to lead to high affinity compounds (Abad-Zapatero, 2007; Morgan et al., 2011; Cavalluzzi et al., 2017). Another recent review (DeGoey et al., 2017) has pointed out the emergence of approved drugs that violate Lipinski’s rules of 5 and correlated them to properties such as number of aromatic rings and rotatable bonds. Freire (2009) have proposed an experimental thermodynamic approach to guide the drug design process and these results led to believe that tweaking ligand enthalpy and entropy of binding is not only experimentally possible, but also possible to predict. Therefore, the GROMOS series of force fields present an extra advantage here due to their calibration to reproduce free-energy of solvation and other thermodynamical properties. In this sense, we have parameterized and validated a calibration set of 42 aromatic rings commonly used in drug design using thermodynamical properties in condensed phase. After, we performed a study with a larger dataset of 103 heteroaromatic rings in order to understand how these molecules interact with water and to prospect and map potential interactions with target-receptors. The water molecules probe the occurrence of hydrogen bonds, and the absence of these interactions, as well as the distance from the first solvation sphere, may probe sites for hydrophobic interactions. With these information at hand, medicinal chemists and pharmacologists may employ quantitative estimations on how each functional group may or may not interact with its target protein, as well as identify the potential influence of close chemical modifications. These properties (and a handful of others) are compiled in Tables 4, 5, and can be used as reference during lead optimization process. The strategy employed here could be used to amplify the spectrum of drug fragments with accurate description of chemical events simulated by molecular dynamics. In addition, it can improve the description of drug-receptor complexation dynamics of other molecules of interest, molecular recognition of drugs and signal transduction mediated by conformational changes of ligands. In fact, by assessing the strength and availability of interactions between aromatic rings and water solvent, the results presented here not only offer detailed quantitative information about potential interactions that each individual aromatic ring can make with its surrounding, but also shed light upon the energetics of biological events, such as dismantling solvation shells — an important step in the ligand binding process. 5. CONCLUSIONS In this work, we have successfully produced topologies for a calibration set of 42 aromatic rings using as target physicochemical properties of respective organic liquids. Our Frontiers in Pharmacology | www.frontiersin.org 17 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution strategy revealed a very competitive prediction power when compared alongside with other force fields, while presenting a simple approach to describe aromatic rings through molecular dynamics simulations that can be easily extrapolated to other rings. In addition to that, H-bond availability and solvent accessibility are difficult and non-obvious informations to predict from bidimensional data, but still essential for medicinal chemistry purposes. Here, we have simulated in aqueous solvent more than 100 aromatic rings commonly used in drug design in order to assess dynamical chemical properties, such as average Hbonds, their lifetime, residence time and free energy of breakage. Thus, we have described a low cost approach based on molecular dynamics simulations to access valuable information that could be useful both to predict the enthalpic cost of desolvation and for interpretation of pharmacological data by a medicinal chemist or pharmacologist. Our results provide a large database of quantitative information for a total of 103 aromatic rings most commonly used in drug design that can guide medicinal chemists in future drug design efforts. AUTHOR CONTRIBUTIONS MP carried out quantum calculations, molecular dynamics simulations, data analyses, and drafted the manuscript. VR contributed in the simulations protocols and manuscript draft. BG wrote in house scripts for dipole-based charge assignment and data analyses. MD contributed to manuscript draft. RL contributed to simulations protocols and manuscript draft. HV contributed to data analyses and manuscript draft. FUNDING The authors thank the funding agencies Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and Fundação de Amparo à Pesquisa do Rio Grande do Sul (FAPERGS). This work was partially supported by grants from FAPERGS/PRONUPEQ (16/2551-0000520-6). ACKNOWLEDGMENTS Research developed with support of the Centro Nacional de Supercomputação (CESUP), from Universidade Federal do Rio Grande do Sul (UFRGS). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. SUPPLEMENTARY MATERIAL The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphar. 2018.00395/full#supplementary-material REFERENCES Abad-Zapatero, C. (2007). Ligand efficiency indices for effective drug discovery. Expert Opin. Drug Dis. 2, 469–488. doi: 10.1517/17460441.2.4.469 Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., et al. (2015). Gromacs: high performance molecular simulations through multilevel parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25. doi: 10.1016/j.softx.2015.06.001 Aldeghi, M., Malhotra, S., Selwood, D. L., and Chan, A. W. (2014). Twoand three-dimensional rings in drugs. Chem. Biol. Drug Des. 83, 450–461. doi: 10.1111/cbdd.12260 Anderson, A. C. (2003). The process of structure-based drug design. Chem. Biol. 10, 787–797. doi: 10.1016/j.chembiol.2003.09.002 Aqvist, J., Medina, C., and Samuelsson, J. E. (1994). A new method for predicting binding affinity in computer-aided drug design. Protein Eng. 7, 385–391. Bajorath, J. (2015). Computer-aided drug discovery. F1000 Res. 4:630. doi: 10.12688/f1000research.6653.1 Barker, J. A. and Watts, R. O. (1973). Monte carlo studies of the dielectric properties of water-like models. Mol. Phys. 26, 789–792. Bayly, C. I., Cieplak, P., Cornell, W., and Kollman, P. A. (1993). A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J. Phys. Chem. 97, 10269–10280. Bemis, G. W. and Murcko, M. A. (1996). The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A., and Haak, J. R. (1984). Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690. Beutler, T. C., Mark, A. E., van Schaik, R. C., Gerber, P. R., and van Gunsteren, W. F. (1994). Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chem. Phys. Lett. 222, 529–539. Biela, A., Khayat, M., Tan, H., Kong, J., Heine, A., Hangauer, D., et al. (2012). Impact of ligand and protein desolvation on ligand binding to the S1 pocket of thrombin. J. Mol. Biol. 418, 350–366. doi: 10.1016/j.jmb.2012.01.054 Blundell, C. D., Nowak, T., and Watson, M. J. (2016). Measurement, interpretation and use of free ligand solution conformations in drug discovery. Prog. Med. Chem. 55, 45–147. doi: 10.1016/bs.pmch.2015.10.003 Blundell, C. D., Packer, M. J., and Almond, A. (2013). Quantification of free ligand conformational preferences by NMR and their relationship to the bioactive conformation. Bioorg. Med. Chem. 21, 4976–4987.doi: 10.1016/j.bmc.2013.06.056 Broughton, H. B. and Watson, I. A. (2004). Selection of heterocycles for drug design. J. Mol. Graph Model. 23, 51–58.doi: 10.1016/j.jmgm.2004.03.016 Bussi, G., Donadio, D., and Parrinello, M. (2007). Canonical sampling through velocity rescaling. J. Chem. Phys. 126:014101. doi: 10.1063/1.2408420 Butler, K. T., Luque, F. J., and Barril, X. (2009). Toward accurate relative energy predictions of the bioactive conformation of drugs. J. Comput. Chem. 30, 601–610. doi: 10.1002/jcc.21087 Caleman, C., van Maaren, P. J., Hong, M., Hub, J. S., Costa, L. T., and van der Spoel, D. (2012). Force field benchmark of organic liquids: Density, enthalpy of vaporization, heat capacities, surface tension, isothermal compressibility, volumetric expansion coefficient, and dielectric constant. J. Chem. Theor. Comput. 8, 61–74. doi: 10.1021/ct200731v Cavalluzzi, M. M., Mangiatordi, G. F., Nicolotti, O., and Lentini, G. (2017). Ligand efficiency metrics in drug discovery: the pros and cons from a practical perspective. Expert Opin. Drug Dis. 12, 1087–1104. doi: 10.1080/ 17460441.2017 Csermely, P., Korcsmáros, T., Kiss, H. J., London, G., and Nussinov, R. (2012). Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review. Pharmacol. Ther. 138, 333–408. doi: 10.1016/j.pharmthera.2013.01.016 Danishuddin and Khan, A. U. (2016). Descriptors and their selection methods in QSAR analysis: paradigm for drug design. Drug Discov. Today 21, 1291–1302. doi: 10.1016/j.drudis.2016.06.013 Daura, X., Mark, A. E., and Van Gunsteren, W. F. (1998). Parametrization of aliphatic CHn united atoms of GROMOS96 force field. J. Comput. Chem. 19, 535–547. Frontiers in Pharmacology | www.frontiersin.org 18 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution DeGoey, D. A., Chen, H. J., Cox, P. B., and Wendt, M. D. (2017). Beyond the rule of 5: lessons learned from AbbVie’s drugs and compound collection. J. Med. Chem. 61, 2636–2651. doi: 10.1021/acs.jmedchem.7b00717 Dobbs, K. D. and Hehre, W. J. (1986). Molecular orbital theory of the properties of inorganic and organometallic compounds 4. Extended basis sets for third-and fourth-row, main-group elements. J. Comput. Chem. 7, 359–378. Domagała, M., Lutyn´ska, A., and Palusiak, M. (2017). Halogen bond versus hydrogen bond: The many-body interactions approach. Int. J. Quantum Chem. 117:e25348. doi: 10.1002/qua.25348 Feenstra, K. A., Hess, B., and Berendsen, H. J. C. (1999). Improving efficiency of large time-scale molecular dynamics simulations of hydrogen rich systems. J. Comput. Chem. 20, 786–798. Ferenczy, G. G. and Keseru, G. M. (2010). Thermodynamics guided lead discovery and optimization. Drug Discov. Today 15, 919-932. doi: 10.1016/j.drudis.2010.08.013 Fock, V. (1930). Näherungsmethode zur Lösung des quantenmechanischen Mehrkörperproblems. Z. Phys. 61, 126–148. Ford, M. C. and Ho, P. S. (2016). Computational tools to model halogen bonds in medicinal chemistry. J. Med. Chem. 59, 1655–1670. doi: 10.1021/acs.jmedchem.5b00997 Freire, E. (2009). A thermodynamic approach to the affinity optimization of drug candidates. Chem. Biol. Drug Des. 74, 468–472. doi: 10.1111/j.1747-0285.2009.00880.x Frisch, M. J., Trucks, G. W., Schlegel, H. B., Scuseria, G. E., Robb, M. A., Cheeseman, J. R., et al. (2016). Gaussian 09, Revision A.02. Wallingford, CT. Ganesan, A., Coote, M. L., and Barakat, K. (2017). Molecular dynamics-driven drug discovery: leaping forward with confidence. Drug Discov. Today 22, 249–269. doi: 10.1016/j.drudis.2016.11.001 Gao, Q., Yang, L., and Zhu, Y. (2010). Pharmacophore based drug design approach as a practical process in drug discovery. Curr. Comput. Aid Drug 6, 37–49. doi: 10.2174/157340910790980151 Gleeson, M. P. (2008). Generation of a set of simple, interpretable ADMET rules of thumb. J. Med. Chem. 51, 817–834. doi: 10.1021/jm701122q Gleeson, M. P., Hersey, A., Montanari, D., and Overington, J. (2011). Probing the links between in vitro potency, ADMET and physicochemical parameters. Nat. Rev. Drug Discov. 10, 197–208. doi: 10.1038/nrd3367 Gumbart, J. C., Roux, B., and Chipot, C. (2013). Standard binding free energies from computer simulations: what is the best strategy? J. Chem. Theor. Comput. 9, 794–802. doi: 10.1021/ct3008099 Halgren, T. A. (1996). Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions. J. Comput. Chem. 17, 520–552. Hanwell, M. D., Curtis, D. E., Lonie, D. C., Vandermeersch, T., Zurek, E., and Hutchison, G. R. (2012). Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminformatics 4:17. doi: 10.1186/1758-2946-4-17 Hartree, D. R. and Hartree, W. (1935). Self-consistent field, with exchange, for beryllium. Proc. R. Soc. A Math. Phys. 150, 9–33. Heinz, T. N., van Gunsteren, W. F., and Hünenberger, P. H. (2001). Comparison of four methods to compute the dielectric permittivity of liquids from molecular dynamics simulations. J. Chem. Phys. 115, 1125–1136. doi: 10.1063/ 1.1379764 Hess, B. (2008). P-LINCS: a parallel linear constraint solver for molecular simulation. J. Chem. Theor. Comput. 4, 116–122. doi: 10.1021/ct700200b Hess, B., Bekker, H., Berendsen, H. J. C., and Fraaije, J. G. E. M. (1997). LINCS: a linear constraint solver for molecular simulations. J. Comput. Chem. 18, 1463–1472. Hopkins, A. L., Keserü, G. M., Leeson, P. D., Rees, D. C., and Reynolds, C. H. (2014). The role of ligand efficiency metrics in drug discovery. Nat. Rev. Drug Discov. 13, 105–121. doi: 10.1038/nrd4163 Horta, B. A., Merz, P. T., Fuchs, P. F., Dolenc, J., Riniker, S., and Hünenberger, P. H. (2016). A GROMOS-compatible force field for small organic molecules in the condensed phase: the 2016H66 parameter set. J. Chem. Theor. Comput. 12, 3825–3850. doi: 10.1021/acs.jctc.6b00187 Horta, B. A., Fuchs, P. F., van Gunsteren, W. F., and Hünenberger, P. H. (2011). New interaction parameters for oxygen compounds in the GROMOS force field: Improved pure-liquid and solvation properties for alcohols, ethers, aldehydes, ketones, carboxylic acids, and esters. J. Chem. Theor. Comput. 7, 1016–1031. doi: 10.1021/ct1006407 Jordan, A. M. and Roughley, S. D. (2009). Drug discovery chemistry: a primer for the non-specialist. Drug Discov Today 14, 731–744. doi: 10.1016/j.drudis.2009.04.005 Jorgensen, W. L., Maxwell, D. S., and Tirado-Rives, J. (1996). Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 118, 11225–11236. Kaur, D. and Khanna, S. (2011). Intermolecular hydrogen bonding interactions of furan, isoxazole and oxazole with water. Comput. Theor. Chem. 963, 71–75. doi: 10.1016/j.comptc.2010.09.011 Keserü, G. M. and Makara, G. M. (2009). The influence of lead discovery strategies on the properties of drug candidates. Nat. Rev. Drug Discov. 8, 203–212. doi: 10.1038/nrd2796 Kunz, A. P., and van Gunsteren, W. F. (2009). Development of a nonlinear classical polarization model for liquid water and aqueous solutions: COS/D. J. Phys. Chem. A 113, 11570–11579. doi: 10.1021/jp903164s Lee, C. H., Huang, H. C., and Juan, H. F. (2011). Reviewing ligand-based rational drug design: the search for an ATP synthase inhibitor. Int. J. Mol. Sci. 12, 5304–5318. doi: 10.3390/ijms12085304 Leeson, P. D. and Springthorpe, B. (2007). The influence of drug-like concepts on decision-making in medicinal chemistry. Nat. Rev. Drug Discov. 6, 881–890. doi: 10.1038/nrd2445 Li, S., Smith, D. G., and Patkowski, K. (2015). An accurate benchmark description of the interactions between carbon dioxide and polyheterocyclic aromatic compounds containing nitrogen. Phys. Chem. Chem. Phys. 17, 16560–16574. doi: 10.1039/c5cp02365c Limongelli, V., Bonomi, M., and Parrinello, M. (2013). Funnel metadynamics as accurate binding free-energy method. Proc. Natl. Acad. Sci. U.S.A. 110, 6358–6363. doi: 10.1073/pnas.1303186110 Lin, F. Y. and Mackerell, A. D. (2017). Do halogen-hydrogen bond donor interactions dominate the favorable contribution of halogens to ligand-protein binding? J. Phys. Chem. B 121, 6813–6821. doi: 10.1021/acs.jpcb.7b04198 Lionta, E., Spyrou, G., Vassilatis, D. K., and Cournia, Z. (2014). Structurebased virtual screening for drug discovery: principles, applications and recent advances. Curr. Top. Med. Chem. 14, 1923–1938. doi: 10.2174/1568026614666140929124445 Lounnas, V., Ritschel, T., Kelder, J., McGuire, R., Bywater, R. P., and Foloppe, N. (2013). Current progress in structure-based rational drug design marks a new mindset in drug discovery. Comput. Struct. Biotechnol. J. 5:e201302011. doi: 10.5936/csbj.201302011 Matczak, P. and Wojtulewski, S. (2015). Performance of Møller-Plesset secondorder perturbation theory and density functional theory in predicting the interaction between stannylenes and aromatic molecules. J. Mol. Model. 21, 41. doi: 10.1007/s00894-015-2589-1 Mennucci, B. and Tomasi, J. (1997). Continuum solvation models: a new approach to the problem of solute’s charge distribution and cavity boundaries. J. Chem. Phys. 106, 5151–5158. Møller, C. and Plesset, M. S. (1934). Note on an approximation treatment for many-electron systems. Phys. Rev. 46, 618–622. Mondal, J., Friesner, R. A., and Berne, B. J. (2014). Role of desolvation in thermodynamics and kinetics of ligand binding to a kinase. J. Chem. Theor. Comput. 10, 5696–5705. doi: 10.1021/ct500584n Morgan, S., Grootendorst, P., Lexchin, J., Cunningham, C., and Greyson, D. (2011). The cost of drug development: a systematic review. Health Policy 100, 4–17. doi: 10.1016/j.healthpol.2010.12.002 Nosé, S. (1984). A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 52, 255–268. Oostenbrink, C., Villa, A., Mark, A. E., and van Gunsteren, W. F. (2004). A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5 and 53A6. J. Comput. Chem. 25, 1656–1676. doi: 10.1002/jcc.20090 Parrinello, M. and Rahman, A. (1981). Polymorphic transitions in single crystals: a new molecular dynamics method. J. Appl. Phys. 52, 7182–7190. Parthasarath, R., Subramanian, V., and Sathyamurthy, N. (2005). Hydrogen bonding in phenol, water, and phenol-water clusters. J. Phys. Chem. A 109, 843–850. doi: 10.1021/jp046499r Frontiers in Pharmacology | www.frontiersin.org 19 April 2018 | Volume 9 | Article 395 Polêto et al. Aromatic Rings Interactions in Aqueous Solution Paul, S. M., Mytelka, D. S., Dunwiddie, C. T., Persinger, C. C., Munos, B. H., Lindborg, S. R., et al. (2010). How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat. Rev. Drug Discov. 9, 203. doi: 10.1038/nrd3078 Petersson, G. A., Bennett, A., Tensfeldt, T. G., Al-Laham, M. A., Shirley, W. A., and Mantzaris, J. (1988). A complete basis set model chemistry. I. The total energies of closed-shell atoms and hydrides of the first-row elements. J. Chem. Phys. 89, 2193–2218. Rendine, S., Pieraccini, S., Forni, A., and Sironi, M. (2011). Halogen bonding in ligand—receptor systems in the framework of classical force fields. Phys. Chem. Chem. Phys. 13:19508. doi: 10.1039/c1cp22436k Reynolds, C. H. and Holloway, M. K. (2011). Thermodynamics of ligand binding and efficiency. ACS Med. Chem. Lett. 2, 433–437. doi: 10.1021/ml2 00010k Roughley, S. D. and Jordan, A. M. (2011). The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates. J. Med. Chem. 54, 3451–3479. doi: 10.1021/jm200187y Rusu, V. H., Baron, R., and Lins, R. D. (2014). PITOMBA: Parameter Interface for Oligosaccharide Molecules Based on Atoms. J. Chem. Theor. Comput. 10, 5068–5080. doi: 10.1021/ct500455u Schuler, L. D., Daura, X., and van Gunsteren, W. F. (2001). An improved FROMOS96 force field for aliphatic hydrocarbons in the condensed phase. J. Comput. Chem. 22, 1205–1218. doi: 10.1002/jcc.1078 Shahlaei, M. (2013). Descriptor selection methods in quantitative structureactivity relationship studies: a review study. Chem. Rev. 113, 8093–8103. doi: 10.1021/cr3004339 Shirts, M. R. and Pande, V. S. (2005). Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration. J. Chem. Phys. 122:144107. doi: 10.1063/1.18 73592 Sliwoski, G., Kothiwale, S., Meiler, J., and Lowe, E. W. (2013). Computational methods in drug discovery. Pharmacol. Rev. 66, 334–395. doi: 10.1124/pr.112.007336 Taylor, R. D., MacCoss, M., and Lawson, A. D. (2017). Combining molecular scaffolds from FDA approved drugs: application to drug discovery. J. Med. Chem. 60, 1638–1647. doi: 10.1021/acs.jmedchem.6b01367 Taylor, R. D., MacCoss, M., and Lawson, A. D. (2014). Rings in drugs. J. Med. Chem. 57, 5845–5859. doi: 10.1021/jm4017625 Tironi, I. G., Sperb, R., Smith, P. E., and van Gunsteren, W. F. (1995). A generalized reaction field method for molecular dynamics simulations. J. Chem. Phys. 102, 5451–5459. Van Gunsteren, W. F. and Berendsen, H. J. C. (1988). A leap-frog algorithm for stochastic dynamics. Mol. Simulat. 1, 173–185. Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A., and Case, D. A. (2004). Development and testing of a general Amber force field. J. Comput. Chem. 25, 1157–1174. doi: 10.1002/jcc.20035 Waring, M. J. (2009). Defining optimum lipophilicity and molecular weight ranges for drug candidates-Molecular weight dependent lower log D limits based on permeability. Bioorg. Med. Chem. Lett. 19, 2844–2851. doi: 10.1016/j.bmcl.2009.03.109 Waring, M. J. (2010). Lipophilicity in drug discovery. Expert Opin. Drug Dis. 5, 235–248. doi: 10.1517/17460441003605098 Welsch, M. E., Snyder, S. A., and Stockwell, B. R. (2010). Privileged scaffolds for library design and drug discovery. Curr. Opin. Chem. Biol. 14, 347–361. doi: 10.1016/j.cbpa.2010.02.018 Woo, H. J. and Roux, B. (2005). Calculation of absolute protein-ligand binding free energy from computer simulations. Proc. Natl. Acad. Sci. U.S.A. 102, 6825–6830. doi: 10.1073/pnas.0409005102 Zhao, H. and Caflisch, A. (2015). Molecular dynamics in drug design. Eur. J. Med. Chem. 91, 4–14. doi: 10.1002/ijch.201400009 Zwanzig, R. W. (1954). High-temperature equation of state by a perturbation method. I. nonpolar gases. J. Chem. Phys. 22, 1420–1426. Conflict of Interest Statement: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer GT and handling Editor declared their shared affiliation. Copyright © 2018 Polêto, Rusu, Grisci, Dorn, Lins and Verli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. Frontiers in Pharmacology | www.frontiersin.org 20 April 2018 | Volume 9 | Article 395 Polêto et al. Supplementary Material Table S1. Parameters used to describe the torsional profile within the working set. Molecule Name Phenol Nitrobenzene Benzenethiol Trifluoromethylbenzene Benzaldehyde Metoxybenzene Phenylmethanol Ethenylbenzene 1-phenylethanone Ethylbenzene (1-methylethyl)-benzene Aniline Methylbenzoate Methylbenzoate Phenoxybenzene Code PHN NBE BTH TFM BNZ MBO PHM ENB 1PE ETB MEB ANI MBA MBA PBE Dihedral atoms CD1-CG-OH-HH CD1-CG-N-O1 CD1-CG-SH-HH CD1-CG-CF-F1 CD2-CG-C-O CD1-CG-OG-CH3 CD1-CG-CO-OH CD2-CG-CB-CA CD1-CG-CO-O CD2-CG-CB-CA CD1-CG-CH-CC1 CD1-CG-NG-H1 CD1-CG-CO-O1 CG-CO-O2-CH3 CD1-CG-OG-C1 Phase shift 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 120 0 0 0 0 0 0 0 0 Coefficient 7.541 -9.816 8.771 -9.487 0.450 -0.953 0.080 15.107 -15.653 5.941 -11.443 0.816 -1.479 4.687 -6.644 11.226 -12.760 -0.500 0.401 -0.712 -1.309 9.121 12.632 22.045 -10.542 23.399 5.154 -23.386 -2.559 0.505 Multiplicity 0 2 0 2 4 2 6 0 2 0 2 0 2 0 2 0 2 2 0 6 2 2 0 1 2 0 1 2 2 6 2 Polêto et al. Supplementary Material Table S2. Experimental thermodynamics properties used as reference in this work. Temperature (T) in K, density (ρ) in g/cm3, enthalpy of vaporization (∆Hvap) in kJ/mol, thermal expansion coefficient (αP ) in 10-3/K, isothermal compressibility (κT ) in 1/GPa, dielectric constant (ε), classic isobaric heat capacity (Cpcla) in J/mol×K, molecular weight (MW) in g/mol and free-energy of solvation (∆Ghyd) in kJ/mol. Molecule Name Benzene T ρ ∆Hvap αP κT ε Cpcla MW ∆Ghyd 293.15 0.8765 34.08 1.23 0.96 2.27 135.70 78.112 -3.598 Pyrroline 298.15 0.9653 45.15 0.87 0.65 7.92 128.20 67.09 -20.000 Furan 298.15 0.9313 27.46 0.73 - 2.94 114.80 68.074 - Fluorobenzene 1,2-fluorobenzene 1,3-fluorobenzene 1,2,3,4-fluorobenzene 1,2,3,5-fluorobenzene 298.15 298.15 298.15 298.15 298.15 1.0191 1.1500 1.1620 1.4161 1.3930 34.58 36.11 36.58 36.61 35.40 1.18 0.94 5.34 146.30 96.102 1.20 0.94* 13.59 159.00 114.093 1.20 0.94* 5.06 159.10 114.093 - 0.94* - 190.06 150.074 - 0.94* - 190.19 150.074 -3.347 - Pyridine Pyrimidine 298.15 0.9778 40.15 1.02 0.71 12.98 135.60 79.101 -19.623 298.15 1.0164 49.81 0.89 0.71* - 133.70 80.088 - Thiophene Phenol 298.15 1.0590 34.65 1.10 - 2.73 123.98 84.14 -5.941 318.15 1.0545 56.32 0.80 0.92* 11.10 202.77 94.111 -27.656 Toluene 298.15 0.8619 37.99 1.07 0.92 2.37 157.20 91.139 -3.724 Quinoline Isoquinoline 298.15 1.0900 64.10 0.73 0.44 9.00 200.00 129.159 -23.932 303.15 1.0910 59.43 0.68 0.44* 10.60 197.45 129.159 - Nitro-benzene 298.15 1.1987 55.01 0.85 0.51 34.81 177.20 123.11 -17.238 2-chloro-aniline 293.15 1.2100 57.60 - - 13.40 196.88 127.571 -20.543 Benzenethiol 2-methyl-pyridine 3-methyl-pyridine 298.15 298.15 298.15 1.0730 0.9398 0.9533 48.47 42.92 45.23 0.88 0.50 4.29 173.55 110.177 -10.669 0.99 0.70* 9.95 159.20 93.1262 -19.372 0.97 0.70* 11.64 159.00 93.1262 -19.958 4-methyl-pyridine Trifluoromethyl-benzene Benzonitrile 298.15 293.15 288.15 0.9503 1.1779 1.0093 44.81 37.73 52.14 0.96 0.70 11.96 159.00 93.1262 -20.627 1.20 0.92* 9.22 188.80 146.11 -1.046 0.83 0.92* 26.41 163.00 103.121 -17.615 Benzaldehyde 298.15 1.0436 39.60 0.25 0.23 17.40 172.00 106.122 -16.820 Methoxy-benzene Phenyl-methanol 2-methylphenol 298.15 297.15 308.15 0.9894 1.0419 1.0327 45.00 65.59 56.90 0.95 0.69 4.22 208.60 108.138 -10.251 0.69 0.92* 13.09 216.44 108.137 -27.698 0.79 0.61* 6.44 234.03 108.137 -24.560 3-methylphenol 320.00 1.0123 60.91 0.75 0.61 10.43 233.79 108.137 -22.970 4-methyl-phenol 313.15 1.0185 63.23 0.85 - 11.21 229.41 108.137 -25.648 Ethenyl-benzene 298.15 0.9010 43.93 0.97 0.86 2.46 182.50 104.149 -5.188 1-phenyl-ethanone 298.15 1.0234 53.40 0.84 0.56 17.44 204.60 120.148 -19.163 Ethyl-benzene 298.15 0.8625 42.25 1.02 0.86 2.43 185.50 106.165 -3.305 1,2-dimethyl-benzene 1,2-dimethoxy-benzene 2,4,6-trimethyl-pyridine 298.15 298.15 295.15 0.8760 1.0820 0.9104 43.43 48.38 50.34 0.95 0.81 2.55 0.93 0.69* 4.41 0.83 0.92* 7.81 188.00 106.165 - 138.163 214.00 121.179 -3.766 - (1-methylethyl)-benzene 298.15 0.8573 45.14 0.98 0.98 - 198.90 120.191 -1.255 1,2,4-trimethyl-benzene 298.15 0.8720 47.57 0.90 0.84 2.37 214.94 120.191 -3.598 1-chloro-naphthalene 298.15 1.1880 64.660 0.70 0.49 5.04 222.14 162.615 - Aniline 298.15 1.0217 55.83 0.83 0.47 7.06 191.90 93.127 -22.970 Methyl-benzoate 298.15 1.0840 55.57 0.88 0.45 6.64 221.30 136.148 -16.401 Methyl-2-hydroxy-benzoate 298.15 1.1810 61.04 0.70 0.45* 9.47 247.51 152.147 - Phenoxy-benzene 303.15 1.0661 58.42 0.65 - 3.65 269.87 170.207 - ** References - Experimental data extracted from Marcus (1999); Frenkel and Marsh (2003); Lide (1999); Finger et al. (1951); Findlay (1969); Yaws (2003, 2009); Hales and Townsend (1974); Abraham et al. (1990). * Values from most similar compounds used in simulation of organic liquids. Frontiers 3 Polêto et al. Supplementary Material Table S3. Calculated thermodynamics properties of organic liquids extracted from Caleman et. al (2012) Molecule Name Benzene Pyrroline Furan Fluorobenzene 1,2-fluorobenzene 1.3-fluorobenzene 1,2,3,4-fluorobenzene 1,2,3,5-fluorobenzene Pyridine Pyrimidine Thiophene Phenol Toluene Quinoline Isoquinoline Nitro-benzene 2-chloro-aniline Benzenethiol 2-methyl-pyridine 3-methyl-pyridine 4-methyl-pyridine Trifluoromethyl-benzene Benzonitrile Benzaldehyde Methoxy-benzene Phenyl-methanol 2-methyl-phenol 3-methyl-phenol 4-methyl-phenol Ethenyl-benzene 1-phenyl-ethanone Ethyl-benzene 1,2-dimethyl-benzene 1,2-dimethoxy-benzene 2,4,6-trimethyl-pyridine (1-methylethyl)-benzene 1,2,4-trimethyl-benzene 1-chloro-naphthalene Methyl-benzoate Methyl-2-hydroxy-benzoate Phenoxy-benzene ρ (in g/cm3) GAFF - 1.0201 0.9660 0.9773 1.0993 1.0906 1.2478 1.2319 0.9822 1.1160 1.0500 1.0515 0.8512 1.0972 1.0718 1.2331 1.2392 1.0614 0.9415 0.9442 0.9503 1.1719 0.9893 1.0369 0.9919 1.0452 1.0404 1.0204 0.9989 0.8922 1.0215 0.8531 0.8622 1.0750 0.9081 0.8562 0.8592 1.1910 1.1113 1.1927 1.0730 OPLS-AA - 0.9905 0.9582 1.0214 1.1193 1.1071 1.3483 1.3428 0.9753 1.0945 1.0876 1.0570 0.8720 1.0864 1.0999 1.1744 1.2288 1.0511 0.9480 0.9521 0.9484 1.1910 1.0141 1.0314 0.9807 1.0415 1.0394 1.0222 1.0213 0.9121 1.0260 0.8700 0.8890 1.0570 0.9295 0.8739 0.8888 1.1717 1.0968 1.1795 1.0821 ∆Hvap (in kJ/mol) GAFF - 52.51 30.65 33.46 34.05 33.78 35.45 33.81 41.70 50.47 34.26 53.16 37.39 61.13 62.63 70.34 56.13 43.85 45.34 45.70 46.06 41.51 53.52 52.83 48.86 62.62 63.66 65.20 57.23 42.43 58.76 42.36 42.40 64.77 55.78 46.90 47.39 61.67 64.21 71.92 69.65 OPLS-AA - 44.14 30.12 34.45 35.27 34.03 36.85 37.10 41.72 49.33 39.51 61.26 40.02 60.40 74.94 55.09 57.28 41.43 46.07 47.37 46.40 38.61 54.09 54.47 47.52 62.16 63.93 66.28 67.37 45.02 61.68 44.48 46.08 63.53 56.74 48.71 51.84 61.67 62.22 71.44 72.70 αP (in 10-3/K) GAFF - 0.81 1.51 1.60 1.53 1.36 1.77 1.80 1.14 1.14 1.40 1.15 1.62 0.78 0.84 0.77 0.86 1.38 0.96 0.97 1.29 1.36 1.05 1.06 1.05 0.81 0.96 1.00 0.96 1.14 1.00 1.35 1.40 0.97 0.92 1.43 1.15 0.75 0.72 0.99 0.74 OPLS-AA - 1.05 1.56 1.20 1.67 1.75 1.52 1.77 1.07 1.04 1.01 0.92 1.49 0.90 0.45 0.82 0.90 1.11 1.14 0.96 0.95 1.35 0.89 0.77 1.22 0.80 0.93 0.98 0.81 1.18 0.89 1.19 1.05 1.00 0.89 0.99 1.23 0.99 0.71 0.80 0.76 κT (in 1/GPa) GAFF - 0.51 1.01 1.26 1.26 1.25 1.84 1.96 0.64 0.41 0.96 0.61 1.06 0.49 0.43 0.34 0.51 0.85 0.72 0.66 0.67 1.16 0.68 0.58 0.66 0.51 0.52 0.56 0.60 0.88 0.54 0.96 0.98 0.50 0.68 0.90 0.90 0.53 0.47 0.37 0.56 OPLS-AA - 0.60 1.04 1.01 1.4 1.44 1.39 1.51 0.64 0.44 0.62 0.54 0.88 0.48 0.27 0.48 0.47 0.83 0.63 0.59 0.60 1.15 0.48 0.48 0.62 0.46 0.50 0.53 0.49 0.74 0.48 0.83 0.74 0.53 0.59 0.79 0.70 0.55 0.47 0.39 0.46 GAFF - 4.20 1.50 3.30 7.20 8.60 1.10 4.00 4.50 25.20 4.70 2.90 5.60 6.60 6.50 16.90 10.90 2.60 5.80 4.20 5.20 3.90 1.10 11.00 1.10 1.20 4.00 1.10 3.90 6.00 - ε OPLS-AA 4.00 1.50 11.3 3.80 5.30 6.70 8.80 2.60 5.80 1.20 4.00 2.20 8.00 8.00 2.40 5.20 7.10 6.60 2.40 7.90 4.20 6.90 7.10 1.00 6.90 1.20 1.50 4.00 4.40 1.20 3.30 3.40 1.70 Cpcla (in J/mol×K) GAFF - 322.00 195.00 247.00 249.00 249.00 265.00 276.00 234.00 222.00 196.00 299.00 309.00 362.00 340.00 304.00 321.00 287.00 275.00 282.00 281.00 316.00 294.00 291.00 338.00 349.00 361.00 356.00 348.00 331.00 392.00 372.00 362.00 441.00 379.00 422.00 424.00 357.00 368.00 401.00 504.00 OPLS-AA - 215.00 188.00 230.00 272.00 260.00 270.00 289.00 232.00 222.00 190.00 305.00 292.00 353.00 319.00 296.00 331.00 277.00 281.00 291.00 283.00 312.00 248.00 289.00 333.00 378.00 372.00 380.00 359.00 333.00 360.00 366.00 360.00 435.00 390.00 407.00 413.00 387.00 380.00 397.00 445.00 4 Polêto et al. Frontiers Table S4. Absolute errors of calculated physical-chemical properties for each organic liquid in calibration set. Molecule Name ρ ΔHvap C pcla αP κT ε ΔGhyd This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work Benzene 0.83% - - 4.13% - - 88.87% - - 39.12% - - 9.73% - - 52.81% - - 5.51% Pyrroline 3.12% 5.68% 2.61% 6.98% 16.30% 2.24% 69.66% 151.17% 67.71% 16.60% 6.90% 20.69% 17.42% 21.54% 7.69% 48.33% 46.97% 49.49% 9.90% Furan 7.04% 3.73% 2.89% 11.58% 11.62% 9.69% 58.95% 69.86% 63.76% 87.26% 106.85% 113.70% - - - 13.35% 48.98% 48.98% - Fluorobenzene 2.16% 4.10% 0.23% 3.88% 3.24% 0.38% 64.69% 68.83% 57.21% 22.74% 35.59% 1.69% 6.31% 34.04% 7.45% 53.20% 38.20% - 170.51% 1,2-fluorobenzene 2.66% 4.41% 2.67% 1.09% 5.70% 2.33% 48.62% 56.60% 71.07% 6.90% 27.50% 39.17% - - - 39.80% - 16.85% - 1.3-fluorobenzene 4.22% 6.14% 4.72% 6.59% 7.65% 6.97% 48.01% 56.51% 63.42% 9.48% 13.33% 45.83% - - - 42.65% - 24.90% - 1,2,3,4-fluorobenzene 4.62% 11.88% 4.79% 10.90% 3.17% 0.66% 28.49% 39.43% 42.06% - - - -- - -- - - 1,2,3,5-fluorobenzene 10.38% 11.56% 3.60% 3.81% 4.49% 4.80% 32.38% 45.12% 51.95% - - - -- - -- - - Pyridine 3.43% 0.45% 0.26% 9.84% 3.86% 3.91% 72.50% 72.57% 71.09% 9.10% 11.76% 4.90% 29.06% 9.86% 9.86% 56.29% 48.38% 26.01% Pyrimidine 10.69% 9.80% 7.68% 8.44% 1.33% 0.96% 67.29% 66.04% 66.04% 4.62% 28.09% 16.85% - - - -- - - Thiophene 4.16% 0.85% 2.70% 1.43% 1.13% 14.03% 48.33% 58.09% 53.25% 11.15% 27.27% 8.18% - - - 45.98% - 4.76% 70.04% Phenol 2.43% 0.28% 0.24% 6.24% 5.61% 8.77% 48.00% 47.46% 50.42% 13.81% 43.75% 15.00% - - - 41.68% - 47.75% 16.87% Toluene 1.00% 1.24% 1.17% 3.39% 1.58% 5.34% 54.78% 96.56% 85.75% 20.48% 51.40% 39.25% 3.16% 15.22% 4.35% 53.21% 53.59% 49.37% 25.95% Quinoline 0.65% 0.66% 0.33% 2.43% 4.63% 5.77% 66.24% 81.00% 76.50% 3.86% 6.85% 23.29% 15.25% 11.36% 9.09% 11.98% 55.56% 55.56% 4.08% Isoquinoline 0.43% 1.76% 0.82% 5.73% 5.38% 26.10% 72.50% 72.20% 61.56% 22.21% 23.53% 33.82% - - - 70.28% 57.55% 79.25% - Nitro-benzene 4.19% 2.87% 2.03% 6.25% 27.87% 0.15% 58.14% 71.56% 67.04% 8.29% 9.41% 3.53% 35.37% 33.33% 5.88% 90.01% 27.61% 77.02% 106.73% 2-chloro-aniline 0.47% 2.41% 1.55% 6.56% 2.55% 0.56% 78.37% 63.04% 68.12% - - - - - - 70.81% 64.93% 40.30% 54.97% Benzenethiol 5.28% 1.08% 2.04% 2.60% 9.53% 14.52% 55.65% 65.37% 59.61% 12.17% 56.82% 26.14% 4.04% 70.00% 66.00% 27.33% 32.40% 44.06% 8.91% 2-methyl-pyridine 3.49% 0.18% 0.87% 9.23% 5.64% 7.34% 45.82% 72.74% 76.51% 1.87% 3.03% 15.15% - - - 51.13% - 47.74% 17.66% 3-methyl-pyridine 1.90% 0.95% 0.13% 3.70% 1.04% 4.73% 46.55% 77.36% 83.02% 2.96% 0.00% 1.03% - - - 58.53% 51.89% 39.00% 27.25% 4-methyl-pyridine 2.93% 0.00% 0.20% 6.56% 2.79% 3.55% 46.50% 76.73% 77.99% 1.07% 34.38% 1.04% 20.69% 4.29% 14.29% 53.78% 44.82% - 20.30% Trifluoromethyl-benzene 3.98% 0.51% 1.11% 12.92% 10.02% 2.33% 54.23% 67.37% 65.25% 7.46% 13.33% 12.50% - - - 74.96% 29.50% - 630.59% Benzonitrile 1.08% 1.98% 0.48% 5.04% 2.65% 3.74% 56.75% 80.37% 52.15% 17.40% 26.51% 7.23% - - - 48.85% 36.01% 75.01% 14.39% Benzaldehyde 0.01% 0.64% 1.17% 9.69% 33.41% 37.55% 62.84% 69.19% 68.02% 299.00% 324.00% 208.00% 145.52% 152.17% 108.70% 54.91% 37.36% - 36.38% Methoxy-benzene 1.84% 0.25% 0.88% 3.24% 8.58% 5.60% 26.64% 62.03% 59.64% 5.68% 10.53% 28.42% 20.81% 4.35% 10.14% 49.52% 38.39% 43.13% 56.88% Phenyl-methanol 0.01% 0.32% 0.04% 5.49% 4.53% 5.23% 45.42% 61.25% 74.64% 19.41% 17.39% 15.94% - - - 39.30% 55.69% 39.65% 4.98% 2-methyl-phenol 2.46% 0.75% 0.65% 7.74% 11.88% 12.36% 31.31% 54.25% 58.95% 11.57% 21.52% 17.72% - - - 26.39% 34.78% 34.78% 11.60% 3-methyl-phenol 2.85% 0.80% 0.98% 5.29% 7.04% 8.82% 27.54% 52.27% 62.54% 9.69% 33.33% 30.67% 18.26% 8.20% 13.11% 42.35% 50.14% 33.84% 5.40% 4-methyl-phenol 2.78% 1.92% 0.27% 2.54% 9.49% 6.55% 31.43% 51.69% 56.49% 2.67% 12.94% 4.71% - - - 46.56% 65.21% 36.66% 6.82% Ethenyl-benzene 22.74% 0.98% 1.23% 37.54% 3.41% 2.48% 40.31% 81.37% 82.47% 88.23% 17.53% 21.65% 250.40% 2.33% 13.95% 57.61% 55.28% 59.35% 135.66% 1-phenyl-ethanone 0.32% 0.19% 0.25% 1.96% 10.04% 15.51% 39.60% 91.59% 75.95% 6.58% 19.05% 5.95% 13.71% 3.57% 14.29% 58.40% 36.93% 60.44% 43.48% Ethyl-benzene 0.71% 1.09% 0.87% 2.02% 0.26% 5.28% 39.81% 100.54% 97.30% 7.85% 32.35% 16.67% 6.73% 11.63% 3.49% 42.50% 54.73% 50.62% 120.85% 1,2-dimethyl-benzene 0.98% 1.58% 1.48% 3.35% 2.37% 6.10% 27.62% 92.55% 91.49% 10.46% 47.37% 10.53% 3.14% 20.99% 8.64% 53.60% 52.94% 41.18% 84.03% 1,2-dimethoxy-benzene 1.63% 0.65% 2.31% 27.39% 33.88% 31.31% - - - 11.74% 4.30% 7.53% - - - 15.99% - 9.30% - 2,4,6-trimethyl-pyridine 0.51% 0.25% 2.10% 4.86% 10.81% 12.71% 5.76% 77.10% 82.24% 16.48% 10.84% 7.23% - - - 53.60% 48.78% 43.66% - (1-methylethyl)-benzene 0.52% 0.13% 1.94% 5.48% 3.90% 7.91% 40.69% 112.17% 104.63% 9.16% 45.92% 1.02% 16.57% 8.16% 19.39% - - - 123.07% 1,2,4-trimethyl-benzene 1.36% 1.47% 1.93% 1.38% 0.38% 8.98% 11.66% 97.26% 92.15% 3.18% 27.78% 36.67% 15.63% 7.14% 16.67% 53.65% 53.59% 49.37% 129.28% 1-chloro-naphthalene 0.64% 0.25% 1.37% 2.78% 4.62% 4.62% 53.79% 60.71% 74.21% 11.09% 7.14% 41.43% 7.73% 8.16% 12.24% 95.05% - 34.52% - Aniline 1.87% - - 4.80% - - 82.38% - - 46.41% - - 36.42% - - 35.93% - - 41.53% Methyl-benzoate 4.00% 2.52% 1.18% 7.26% 15.55% 11.97% 37.09% 66.29% 71.71% 5.32% 18.18% 19.32% 6.07% 4.44% 4.44% 65.94% 41.27% 48.80% 39.64% Methyl-2-hydroxy-benzoate 4.77% 0.99% 0.13% 13.75% 17.82% 17.04% 33.16% 62.01% 60.40% 5.80% 41.43% 14.29% - - - 42.38% 36.64% - - Phenoxy-benzene 1.77% 0.65% 1.50% 16.49% 19.22% 24.44% 68.74% 86.76% 64.89% 42.40% 13.85% 16.92% - - - 47.20% - 53.42% - Average absolute error 3.16% 2.20% 1.58% 7.20% 8.37% 8.83% 49.20% 72.69% 69.47% 23.88% 33.29% 25.23% 32.48% 22.67% 18.40% 49.63% 46.29% 44.75% 68.31% Standard Deviation 3.92% 2.97% 1.54% 6.82% 8.31% 8.47% 18.49% 20.59% 13.57% 49.33% 53.04% 36.87% 58.27% 35.26% 25.68% 17.60% 10.46% 16.81% 116.19% Supplementary Material 5 Polêto et al. Table S5. Deviation of calculated physical-chemical properties for each organic liquid in calibration set. Molecule Name ρ ΔHvap C pcla αP κT ε ΔGhyd This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work GAFF OPLS-AA This work Benzene 0.0073 - - -1.4061 - - 120.60 - - 0.48 - - 0.09 - - -1.20 - - 0.20 Pyrroline 0.0301 0.0548 0.0252 3.1529 7.3600 -1.0100 89.30 193.80 86.80 0.14 -0.06 0.18 -0.11 -0.14 -0.05 -3.83 -3.72 -3.92 1.98 Furan 0.0656 0.0347 0.0269 3.1798 3.1900 2.6600 67.67 80.20 73.20 0.64 0.78 0.83 - - - -0.39 -1.44 -1.44 - Fluorobenzene -0.0220 -0.0418 0.0023 -1.3418 -1.1200 -0.1300 94.64 100.70 83.70 0.27 0.42 0.02 0.06 0.32 0.07 -1.99 -2.04 - 5.71 1,2-fluorobenzene -0.0306 -0.0507 -0.0307 -0.3940 -2.0600 -0.8400 77.31 90.00 113.00 0.08 0.33 0.47 - - - -5.41 - -2.29 - 1.3-fluorobenzene -0.0490 -0.0714 -0.0549 -2.4089 -2.8000 -2.5500 76.38 89.90 100.90 0.11 0.16 0.55 - - - -2.16 - -1.26 - 1,2,3,4-fluorobenzene -0.0654 -0.1683 -0.0678 3.9895 -1.1600 0.2400 54.15 74.94 79.94 - - - -- - -- - - 1,2,3,5-fluorobenzene -0.1446 -0.1611 -0.0502 1.3497 -1.5900 1.7000 61.58 85.81 98.81 - - - -- - -- - - Pyridine 0.0335 0.0044 -0.0025 3.9512 1.5500 1.5700 98.31 98.40 96.40 0.09 0.12 0.05 -0.21 -0.07 -0.07 -7.31 - -6.28 5.10 Pyrimidine 0.1087 0.0996 0.0781 4.2050 0.6600 -0.4800 89.96 88.30 88.30 0.04 0.25 0.15 - - - -- - - Thiophene 0.0441 -0.0090 0.0286 -0.4944 -0.3900 4.8600 59.91 72.02 66.02 0.12 0.30 -0.09 - - - -1.26 - -0.13 4.16 Phenol 0.0256 -0.0030 0.0025 3.5169 -3.1600 4.9400 97.33 96.23 102.23 0.11 0.35 0.12 - - - -4.63 - -5.30 4.67 Toluene 0.0086 -0.0107 0.0101 -1.2864 -0.6000 2.0300 86.11 151.80 134.80 0.22 0.55 0.42 -0.03 0.14 -0.04 -1.26 -1.27 -1.17 -0.97 Quinoline 0.0071 0.0072 -0.0036 1.5583 -2.9700 -3.7000 132.49 162.00 153.00 0.03 0.05 0.17 -0.07 0.05 0.04 -1.08 -5.00 -5.00 -0.98 Isoquinoline 0.0047 -0.0192 0.0089 3.4044 3.2000 15.5100 143.15 142.55 121.55 0.15 0.16 -0.23 - - - -7.45 -6.10 -8.40 - Nitro-benzene 0.0502 0.0344 -0.0243 3.4378 15.3300 0.0800 103.02 126.80 118.80 -0.07 -0.08 -0.03 -0.18 -0.17 -0.03 -31.33 -9.61 -26.81 18.40 2-chloro-aniline -0.0057 0.0292 0.0188 -3.7793 -1.4700 -0.3200 154.29 124.12 134.12 - - - - - - -9.49 -8.70 -5.40 11.29 Benzenethiol 0.0567 -0.0116 -0.0219 1.2586 -4.6200 -7.0400 96.59 113.45 103.45 0.11 0.50 0.23 0.02 0.35 0.33 1.17 -1.39 -1.89 -0.95 2-methyl-pyridine 0.0328 0.0017 0.0082 3.9614 2.4200 3.1500 72.95 115.80 121.80 0.02 -0.03 0.15 - - - -5.09 - -4.75 3.42 3-methyl-pyridine 0.0181 -0.0091 -0.0012 1.6721 0.4700 2.1400 74.01 123.00 132.00 0.03 0.00 -0.01 - - - -6.81 -6.04 -4.54 5.44 4-methyl-pyridine 0.0278 0.0000 -0.0019 2.9415 1.2500 1.5900 73.94 122.00 124.00 0.01 0.33 -0.01 -0.14 -0.03 -0.10 -6.43 -5.36 - 4.19 Trifluoromethyl-benzene 0.0469 -0.0060 0.0131 4.8756 3.7800 0.8800 102.38 127.20 123.20 -0.09 0.16 0.15 - - - -6.91 -2.72 - 6.60 Benzonitrile 0.0109 -0.0200 0.0048 2.6281 1.3800 1.9500 92.51 131.00 85.00 0.14 0.22 0.06 - - - -12.90 -9.51 -19.81 -2.54 Benzaldehyde 0.0001 -0.0067 -0.0122 3.8368 13.2300 14.8700 108.08 119.00 117.00 0.75 0.81 0.52 0.33 0.35 0.25 -9.55 -6.50 - 6.12 Methoxy-benzene 0.0182 0.0025 -0.0087 1.4576 3.8600 2.5200 55.58 129.40 124.40 0.05 0.10 0.27 -0.14 -0.03 -0.07 -2.09 -1.62 -1.82 5.83 Phenyl-methanol -0.0001 0.0033 -0.0004 3.6010 -2.9700 -3.4300 98.30 132.56 161.56 0.13 0.12 0.11 - - - -5.14 -7.29 -5.19 1.38 2-methyl-phenol 0.0254 0.0077 0.0067 4.4056 6.7600 7.0300 73.28 126.97 137.97 0.09 0.17 0.14 - - - -1.70 -2.24 -2.24 2.85 3-methyl-phenol 0.0289 0.0081 0.0099 3.2238 4.2900 5.3700 64.38 122.21 146.21 0.07 0.25 0.23 -0.11 -0.05 -0.08 -4.42 -5.23 -3.53 -1.24 4-methyl-phenol 0.0283 -0.0196 0.0028 1.6055 -6.0000 4.1400 72.11 118.59 129.59 -0.02 0.11 -0.04 - - - -5.22 -7.31 -4.11 1.75 Ethenyl-benzene -0.2049 -0.0088 0.0111 -16.4894 -1.5000 1.0900 73.57 148.50 150.50 0.86 0.17 0.21 2.15 0.02 -0.12 -1.42 -1.36 -1.46 7.04 1-phenyl-ethanone 0.0033 -0.0019 0.0026 1.0490 5.3600 8.2800 81.02 187.40 155.40 0.06 0.16 0.05 -0.08 -0.02 -0.08 -10.19 -6.44 -10.54 8.33 Ethyl-benzene 0.0061 -0.0094 0.0075 -0.8529 0.1100 2.2300 73.85 186.50 180.50 0.08 0.33 0.17 -0.06 0.10 -0.03 -1.03 -1.33 -1.23 -3.99 1,2-dimethyl-benzene 0.0086 -0.0138 0.0130 -1.4570 -1.0300 2.6500 51.93 174.00 172.00 0.10 0.45 0.10 -0.03 0.17 -0.07 -1.37 -1.35 -1.05 -3.16 1,2-dimethoxy-benzene 0.0176 -0.0070 -0.0250 13.2495 16.3900 15.1500 - - - -0.11 0.04 0.07 - - - -0.71 - -0.41 - 2,4,6-trimethyl-pyridine 0.0046 -0.0023 0.0191 -2.4450 5.4400 6.4000 12.32 165.00 176.00 0.14 0.09 0.06 - - - -4.19 -3.81 -3.41 - (1-methylethyl)-benzene -0.0045 -0.0011 0.0166 -2.4746 1.7600 3.5700 80.93 223.10 208.10 0.09 0.45 0.01 -0.16 -0.08 -0.19 - - - -1.54 1,2,4-trimethyl-benzene 0.0119 -0.0128 0.0168 -0.6566 -0.1800 4.2700 25.06 209.06 198.06 0.03 0.25 0.33 -0.13 0.06 -0.14 -1.27 -1.27 -1.17 -4.65 1-chloro-naphthalene -0.0076 0.0030 -0.0163 -1.7993 -2.9900 -2.9900 119.49 134.86 164.86 0.08 0.05 0.29 -0.04 0.04 0.06 4.79 - -1.74 - Aniline 0.0191 - - -2.6816 - - 158.09 - - 0.39 - - -0.17 - - -2.54 - - 9.54 Methyl-benzoate 0.0434 0.0273 0.0128 4.0361 8.6400 6.6500 82.07 146.70 158.70 -0.05 -0.16 -0.17 -0.03 0.02 0.02 -4.38 -2.74 -3.24 6.50 Methyl-2-hydroxy-benzoate 0.0563 0.0117 -0.0015 8.3921 10.8800 10.4000 82.08 153.49 149.49 0.04 0.29 0.10 - - - -4.01 -3.47 - - Phenoxy-benzene 0.0189 0.0069 0.0160 9.6336 11.2300 14.2800 185.50 234.13 175.13 0.28 0.09 0.11 - - - -1.72 - -1.95 - Mean Deviation 0.0080 -0.0082 0.0010 1.5144 2.2983 3.2428 88.2009 133.8844 129.3972 0.1458 0.2238 0.1551 0.0465 0.0542 -0.0158 -4.5236 -4.2541 -4.5639 3.3488 Standard Deviation 0.0515 0.0454 0.0249 4.4567 5.4193 5.2161 33.4397 40.2247 35.3271 0.21 0.22 0.21 0.50 0.15 0.13 5.65 2.74 5.60 5.02 Supplementary Material 6 Capítulo 5. Resultados 82 5.2 Capítulo II Com o objetivo de compreender a dinâmica conformacional de pequenas moléculas livres em solução e com base na metodologia desenvolvida e aplicada no Capítulo I, novos parâmetros topológicos foram gerados para uma série de chalconas e flavonóides com potenciais usos terapêuticos como descritos nas seções 4.3, 4.4, 4.5 e 4.6. Após a parametrização dos grupamentos químicos de interesse e a devida validação dos termos topológicos utilizados através da comparação com dados de NOESY, simulações de dinâmica molecular foram realizadas para avaliar o impacto de glicosilações comumente observadas em chalconas e flavonóides em suas respectivas populações conformacionais. Foi possível mensurar a mudança da flexibilidade nos diedros presentes nos produtos naturais, mudança essa provocada pela presença de monômeros de carboidratos proximais. Ainda, foi possível medir o impacto que solventes orgânicos e aquosos podem ter na descrição conformacional de produtos naturais, principalmente quando glicosilados. Os resultados apresentados nesse artigo fornecem bases teóricas importantes para a compreensão da dinâmica conformacional de pequenas moléculas livres em solução. Além do desafio metodológico em propor métodos de análise sistematizados para tais sistemas (descrita em seção 4.9.2), a caracterização das populações conformacionais de moléculas com potencial uso terapêutico pode fornecer informações úteis para o desenho racional de novos fármacos baseados nas conformações livres e complexadas ao receptor-alvo. Cite This: J. Phys. Chem. B XXXX, XXX, XXX−XXX Article pubs.acs.org/JPCB Downloaded via UNIV FED DO RIO GRANDE DO SUL on January 28, 2019 at 16:11:32 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles. Development of GROMOS-Compatible Parameter Set for Simulations of Chalcones and Flavonoids Pablo R. Arantes,† Marcelo D. Polet̂ o,† Elisa B. O. John,† Conrado Pedebos,†,‡,§ Bruno I. Grisci,∥ Marcio Dorn,∥ and Hugo Verli*,† †Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS 91500-970, Brazil ‡School of Pharmacy, University of Nottingham, University Park, Nottingham, U.K. §CAPES Foundation, Ministry of Education of Brazil, Brasília, 70040-020, Brazil ∥Instituto de Informat́ ica, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS 91501-970, Brazil *S Supporting Information ABSTRACT: Chalcones and flavonoids constitute a large family of plant secondary metabolites that have been explored as a potential source of novel pharmaceutical products. While the simulation of these compounds by molecular dynamics (MD) can be a valuable strategy to assess their conformational properties and so further develop their role in drug discovery, there are no set of force field parameters specifically designed and experimentally validated for their conformational description in condensed phase. So the current work developed a new parameter set for MD simulations of these compounds’ main scaffolds under GROMOS force field. We employed a protocol adjusting the atomic charges and torsional parameters to the respective quantum mechanical derived dipole moments and dihedrals rotational profiles, respectively. Experimental properties of organic liquids were used as references to the calculated values to validate the parameters. Additionally, metadynamics simulations were performed to evaluate the conformational space of complex chalcones and flavonoids, while NOE contacts during simulations were measured and compared to experimental data. Accordingly, the employed protocol allowed us to obtain force field parameters that reproduce well the target data and may be expected to contribute in more accurate computational studies on the biological/therapeutical role of such molecules. ■ INTRODUCTION Plant secondary metabolites have been studied for many years as potential novel therapeutic agents,1,2 possibly inspired by a long history of application of herbal extracts in traditional folk medicine.2,3 Among them, chalcones and flavonoids consistently have drawn attention mainly because of their extensive range of biological activities, such as cytotoxic,4−6 antioxidant,7 chemopreventive,8 antimicrobial,9 or inhibitory effects against enzymes of medical relevance,10−12 which makes these molecules appealing for exploration in the medicinal chemistry. The classical chalcone scaffold is constituted by two phenyl rings united by an α, β unsaturated ketone system. The latter chemical signature is believed to play an important role in the biological activity of chalcones, since the unsaturated functional group can act as an acceptor in Michael reactions and so promptly be modified when interacting with several compounds.13 Flavonoids are related molecules, derived from chalconoid precursors that undergo cyclization in the α, β unsaturated ketone system, resulting in the presence of a heterocyclic ring connecting the other two phenyl rings. Changes in their structures have been proven useful for the development of new therapeutic candidates and, therefore, increasing the pharmaceutical interest for these biomolecules, which have been intensively studied and modified.14−16 In such process of lead optimization, computational methods provide insightful information to rationalize, model, and predict new chemical entities and their pharmacological properties.17,18 Among those methods, molecular dynamics (MD) simulations can be used to anticipate, complement, or explain experimental data,19,20 providing detailed conformational distributions as a function of both time and space for the compounds of interest, as well as for their respective targetreceptors.21,22 As simulations are able to offer unique, atomic level information about the dynamical recognition of drugs by biological receptors and the consequent signal transduction, reliable results from MD simulations are dependent on, among other factors, the quality and accuracy of the empirical potential energy functions used in such calculations. Thus, a novel parameter set associated with a certain new compound requires careful calibration23 in order to reproduce proper energies of interaction and conformational profiles in condensed phase. While parameters for biomacromolecules are widely available, the chemical diversity of synthetic Received: October 17, 2018 Revised: January 9, 2019 Published: January 9, 2019 © XXXX American Chemical Society A DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Table 1. Obtained Values for Thermodynamic Properties of the Simulated Fragments as Organic Liquidsa fragment temp [K] exp ρ calc ρ error exp ΔHvap calc ΔHvap error [g/cm3] [g/cm3] [%] [kJ/mol] [kJ/mol] [%] 1 298.15 1.02 1.03 0.32 53.40 54.45 1.96 2 298.15 0.90 0.70 22.74 43.93 3 298.15 0.98 0.87 1.84 45.00 4 298.15 1.04 1.04 0.01 39.60 5 298.15 0.86 0.86 0.71 42.25 27.44 46.46 43.43 41.39 37.53 3.24 9.69 2.02 6 318.15 1.05 1.08 2.43 56.32 59.83 6.24 7 298.15 0.79 0.81 2.69 29.63 31.25 5.47 8 298.15 0.84 0.61 27.88 30.9** 31.33 1.05 aReferences: experimental data extracted from refs 36−39. ∗∗ indicates experimental value for 314 K. compounds and natural products constitutes a real challenge to ■ RESULTS AND DISCUSSION classic force field (FF) based calculations. The absence of calibrated parameters for natural products or synthetic compounds has increased the use of automated topology generators throughout atomic level investigation based on MD simulations of ligand−receptor complexes. The accessibility and easiness of such approach contrast with the promiscuous torsional parameters and atomic partial charges based on in vacuo quantum calculations rather than a calibrated set to reproduce energies in condensed phase. In this sense, the GROMOS force field has provided rather good parametrization strategies to calibrate torsional barriers and profiles as well as atomic partial charges of organic molecules in order to reproduce not only condensed phase physicochemical properties but also the conformational profile of small molecules in solution.24−26 In this context and considering the relevance of chalcones and flavonoids families of molecules as scaffolds for the medicinal chemistry, the current work intends to provide a new parameter set for the simulation of such compounds using classic force field calculations, considering their most common chemical modifications. The GROMOS family of force fields Torsional Potentials and Force Field Calibration. There are several drawbacks in using automated topology builders in the simulations of small bioactive compounds.23 Although ATB server29 has recently demonstrated reasonable accuracy in predicting free enthalpies of hydration,30 proper torsional designations are still challenging, since they are based on mathematical descriptors of terms already present on the original force field rather than based on the chemical environment created by atoms involved on the dihedral.31 On the other hand, a proper set of atomic partial charges plays an important role in describing accurate inter- and intramolecular interactions, which may directly impact in the conformational description of small bioactive compounds in simulations. Hence, several small molecules resembling fragments of the structure of chalcones, flavonoids, and their substituents were selected to act as building blocks for the later assembly of complete compounds. These so-called fragments (molecules 1−9 in Figures S1 and 2) had their topologies built with new atomic charges, empirically adjusted to fit into charge groups, was selected for the parametrization strategy due to its adjustment to reproduce condensed phase properties.27 maintaining the molecular polarity, as observed in comparisons of dipole moments from ESP-MP2/6-31* calculations. Accordingly, the molecular mechanical (MM) torsional The derived atomic charges were validated through profiles were fitted to the quantum mechanical (QM) derived ones. For parametrization of the partial charges, the MM comparison to experimental thermodynamic properties of condensed-phase (ρ and ΔHvap), as reported in other works atomic charges were fitted to the QM dipole moments and later on submitted to validation against thermodynamic involving the parametrization of molecules for the GROMOS force field24,32−34 and for force field benchmarking.24,25,35 properties of organic liquids.25 The original approach of Individually, most of the parametrized molecules obtained GROMOS is to empirically adjust the atomic partial charges in values in good agreement with the experimental data (Table order to reproduce thermodynamic properties. However, as 1). One particular outlier was fragment 2, which yielded higher stated by Riniker,28 there is “an infinite number of charge absolute errors despite calibration efforts to reduce it. Our QM distributions, which can reproduce the electrostatic potential calculations revealed a dipole moment of 0.2 D, which suggests (ESP) outside a surface encapsulating all charges”, which leads a low charge polarity for fragment 2. Yet our MM calculations us to use a hybrid approach that both preserves the QM dipole yielded underestimated values of density and enthalpy of moment direction and reproduces thermodynamic properties. vaporization, most likely due to the lack of π−π interactions On the basis of their accuracy, we explored the conformational and resonance effects inherently misrepresented in MM description of chalcones and flavonoids in aqueous and nonaqueous solutions comparing the obtained sampling to calculations. A similar absence of interactions is expected to influence the description of fragment 8 properties. NMR data (NOESY). The characterization of compounds solution conformational ensemble is an important step toward a deeper understanding of the determinants for biological activity of the compounds and, consequently, for a more Therefore, the calibrated charge groups were used to build the scaffold of chalcones and flavonoids, or compounds 1 and 2, respectively (Figure 1). The basic C−H groups within benzene rings were set as −0.13/+0.13, respectively,25 if not efficient design of new bioactive molecules. We expect that part of other substituent charge groups. Previously calibrated such parameters will be able to properly describe the partial atomic charges from phenol and methoxybenzene were conformational distribution of chalcones and flavonoids, a used for common substituents in natural products. In the case starting point to further studies on the biological role of such of vicinal substituents and overlapping atoms, the partial molecules at an atomistic level of detail. charge of such atoms were set as flexible in order to allow the B DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 1. Colored charge groups used in this work for basic chalcones and flavonoids. Common substituents are also shown. Atomic partial charges marked with ∗ stand for charges that are allowed to be modified when superimposing vicinal substituents. MM dipole moment to be adjusted to the QM reference but maintaining the core of the charge group. As a next step, we evaluated how the torsional profiles obtained by QM for the selected building blocks were reproduced by the closest terms presented on GROMOS53a6 force field (Figure 2). For fragments 1, 2, 4, and 5, torsional parameters tested were extracted from phenylalanine, parameters for phenol dihedral were tested for fragments 3 and 6, while parameters of aspartate was tested for fragments 7 and 8. It is clear from these results that nonspecific torsional parameters may not always reproduce the quantum mechanics energy barriers and minima of a given dihedral, as expected. Although the automated approach is broadly used to treat ligands within ligand−receptor complexes,40−43 uncured torsional parameters may strongly impact the accurate description of the conformational ensemble in free ligand molecular dynamics simulations, within the protein or even on receptors’ conformational activation. To address this issue, new dihedral parameters potentials were generated (Table S1) and tested. The curves obtained using the new torsional parameters show good agreement with the respective values obtained by the QM calculations (Figure 2). Dynamics of Chalcones and Flavonoids. The topologies of the scaffold of chalcones and flavonoids previously built and calibrated were used, together with phenol and methoxybenzene atomic partial charges and torsional parameters, in order to build MD topologies for compounds 3, 4, 5, 6, and 7 (Figure 3). Considering these compounds have been previously characterized by NMR spectroscopy,44−47 the interproton contacts (NOESY signals) were used to validate the conformational ensemble obtained from microsecond MD simulations. These simulations were carried out in organic solvent (CHCl3 or DMSO) in order to reproduce the conditions of experimental procedures. Whenever a distance value was below a 5 Å cutoff, it was considered as a correct reproduction. In general, most of the experimentally observed contacts between the analyzed protons were properly reproduced in the simulations using organic solvents (Table 2) and water (Table S2), pointing to a precise conformational characterization of these compounds. In order to investigate the conformational prevalences of chalcones and flavonoids dihedrals, we evaluated the distribution of the torsional angles adopted by the molecules during MD simulations (Figures 4 and 5). It is important to mention that all dihedrals analyzed here presented a high number of transitions between different angle populations, which suggests that our microsecond simulations were sufficiently long to sample most of the conformational states adopted by these molecules and that it was not trapped in a single energetic minimum (Figures S2−S4). The chalcone structure has proven to be very flexible around the dihedrals between rings A and B, as the geometry distribution demonstrated the population of at least two preferential states for each torsion (Figure 4). The chalcone without substituent groups in the aromatic rings (compound 1) presented a single most abundant conformational state for all dihedrals, as a consequence of the ring symmetry. On the other hand, methoxy groups in ortho (compound 3) caused a deviation of ±90° in D1 and 180° in D2 and shift the most abundant D3 angle to 180°. In the case of compound 4, the monosaccharide residue broke the ring symmetry and completely changed the most abundant conformational states for D1, shifting it to a −150° angle and a minor population on 150°. For D2, a second population in 180° had an increased frequency when simulating compound 4 in water, in comparison to the nonsubstituted chalcone (compound 1), which can be explained by a transient H-bond bridged by water between the monosaccharide and the hydroxyl group in ring B and explains the lower frequency observed for D2 of compound 4 simulated in DMSO solvent. For D3, a single hydroxyl group in ortho of B ring was sufficient to substantially reduce the 180° populations observed for compound 1. The flavonoid scaffold has only one torsion and can be considered more rigid than the chalcones studied here. Still, differences can be noticed regarding different vicinal substituent patterns (Figure 5). The dihedral D of the nonsubstituted flavonoid (compound 2) has a rapid interconversion from −15° to 15° angles, which could be misread as a continuous population of 0°. In fact, our QM calculations of fragment 9 have shown energy minima around such values, in addition to ±150°, also in agreement with previous works.48−50 The addition of a vicinal methoxy group (compounds 5 and 6) extinguished the population on ±15° and shifted the major populations to ±150°. The addition of a sugar moiety also in to the dihedral (compound 7) eliminated the population with D = 150° due to steric clashes between methoxy group and the monosaccharide, preserving only the population at D = −150°. The distribution of glycosidic linkages during the MD, on compounds 4 and 7, exhibited single distribution on ϕ and ψ angles (Figures S7 and S8), demonstrating a rigidity for these glycosidic linkages. On compound 4, both dihedral angles revealed only one conformational state (ϕ = −60° and ψ = −60°) (Figure S6). On compound 7, both ϕ1 and ϕ2 dihedrals revealed a single distribution at −60° and −90°, respectively, while ψ1 showed a bimodal distribution (±90°) and ψ2 revealed a single distribution also around −90°. It is important to notice that the ψ1 = 90° is related to a H-bond between the hydroxyl groups of monosaccharides units, which explains the higher frequency of this angle during MD simulations. C DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 2. Comparison of MP2 6-31G* calculations (black) and MM torsional profiles of the structures with the adjusted terms accounting for 1−4 interactions (green) and with parameters for the most similar chemical pattern found in GROMOS53a6 (red). For all compounds, the reproduction of dihedral distributions had little difference during simulations in both aqueous and organic solvents. In specific cases, as in D1 and D2 of compound 4, water molecules mediated intramolecular interactions that increased minor populations observed in simulations in DMSO solvent. Energetic Effects of Substitutions and Solvent. In order to evaluate the effect of different substituents and solvents in the torsional barriers specifically designed in this work for flavonoids and chalcones, a series of metadynamics calculations51−53 were performed for each dihedral angle separately or in vicinal couples, both in water and in organic solvent (CHCl3 or DMSO). On chalcones fragments (10−12), dihedrals D1 and D2 were used as collective variables (CV) to calculate the free- energy surfaces of their torsions. Our results show that additions of different groups on the external rings can modify the torsional free-energy associated with the dihedrals adjacent to the carbonyl group (Figures 6 and 7). When compared to fragment 10, the torsional profile of D1 in fragments 11 and 12 suffered major modification, in both the energetic barrier and number of minima. For fragment 11, the presence of methoxy groups vicinal to the dihedral shifted the energetical minimum from D1 to ±90° and increased the torsional activation energy from 20 to 60 kJ/mol, most likely due to the steric hindrance of such substituents. For D2 in fragment 11, the addition of methoxy groups decreased the torsional activation energy in 10 kJ, shifting the dihedral frequency to 180°, as seen in Figure 4. For fragment 12 in DMSO, the presence of the monossacharide near dihedral D1 induced a shift of the freeenergy minimum to −135° and a local minimum of 135°. In the case of dihedral D2, the monossacharide increases the free energy of −180°, increasing the dihedral population at 0°. However, simulations in water revealed a new energetical D DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 3. Chalcones and flavonoids employed in this work to validated the conformational ensemble obtained by MD simulations, using experimental NOESY signals.44−47 minimum in D1 = 75° due to the increase on the free energy at 135°, which substantially reduces the energy barrier between the two populations, yielding an increase in the frequency of D1 = 75° for compound 4, as shown in Figure 4. In addition, the dihedral distribution of the complete molecules in water revealed an increased population of D2 in ±180°. The new torsional angle of D1 = 75° allows a H-bond mediated by water between the sugar moiety and the hydroxyl group in ring B, increasing frequency of D2 = ±180°. These results suggest that the organization and strength of solvent interactions around flexible molecules can substantially influence their uncomplexed dynamics in solution. The energetic impact of different ring substitutions and solvents was evaluated for dihedral D3 in compounds 1, 3, and 4 using dihedral D3 as CV. Metadynamics calculations of compound 1 in water and CHCl3 are in accordance with the QM and MM torsional profile generated for fragment 2, with free-energy minima in 0° and 180°. However, the presence of a vicinal methoxy group (compound 3) extinguished the minimum in 0°, inducing the preferential conformation at D3 = ±180° in both water and CHCl3. In the case of compound 4 in DMSO, the presence of a vicinal hydroxyl group maintains the 0° minimum while increasing the free energy at 180° and also increasing the preference for torsion D3 at 0°. Despite that, simulations in water showed a new local minimum at ±140°, with a decrease around 10 kJ/mol in comparison to DMSO simulations. Further investigations revealed an intermolecular interaction between the sugar moiety and hydroxyl group in ring B mediated by one or two water molecules, which explains the slight increase of D3 population at 140°. Still, the preferential conformation of D3 remained fixed around 0° and a large deviation of ±40°. Aiming to evaluate the energetic impact from torsions of nearby substituents on flavonoids, metadynamics calculations were also performed for compounds 2, 5, 6, and 7 (Figure 8) using the dihedral D as CV. Calculations of compound 2 showed angular minima at ±30° and ±150°, in accordance to the torsional profile calculated in vacuo by QM methods for fragment 9. It is important to notice the low energetic barrier between +30° and −30° or +180° and −180°, which explains the rapid interconversion between these close minima. However, the addition of a methoxy group near the dihedral D (compound 5) extinguished the minimum at ±30° and shifted the global minimum from ±180° to ±145°. Still, a minor population at D = ±30° can be observed in Figure 5, which can be explained by the low barrier between the global E DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Table 2. NOESY Contacts of Compounds and Interproton Distances Derived from Microsecond MD Simulations in Organic Solventsa aThe asterisk ∗ indicates NOESY contacts that could be above 5 Å during the MD simulations. The average distances were computed as ⟨r−6⟩−1/6, respecting the NOE intensity for small molecules. and these local minima at 30°. For compound 6, the presence of a second methoxy group in para did not change the global or local energy minima but increased the free-energy barrier from 6 to 22 kJ/mol while increasing the content of free energy when D = 45°. For compound 7, the presence of a sugar moiety in ortho to the dihedral maintained the global minimum at −150° while creating a new local minimum at D = 30° due to the torsional asymmetry. Even though transitions could be observed between these minima, the dihedral distribution of D in compound 7 showed a complete preference for −145° ± 20° due to intramolecular H-bond between the sugar moieties. Also, the free energy profiles associated with the ϕ and ψ dihedrals of sugar moieties in compounds 12, 13, 14, 15 (Figures S5 and S6) suffered minor influences from the different solvents, yielding similar minima regions on their freeenergy torsional landscapes. In a particular note, ψ1 on compound 7 revealed a bimodal distribution on angles 90° and −90°, in contrast with ψ2. This can be explained by a intramolecular H-bond between the sugar moieties when ϕ1 = 90° which maintained ϕ2 at −90°. The data gathered here regarding the effect of nearby substitutions and the possible solvent effects in torsional free- energy barriers can be of interest for medicinal chemists while designing new ligands or increasing the potency of old ones. In fact, mapping such torsional free-energy profiles can also be useful to predict likely and unlikely conformations of ligands a priori, a challenging task when starting from 2D chemical structures. Dynamics in Solution. The broad biological activities of natural products in traditional folk medicine4−12 has increased the interest of medicinal chemists to comprehend the basis of molecular recognition at the ligand−receptor complex level. Thus, the knowledge of conformational preferences of bioactive compounds in solution is relevant not only to predict the enthalpic and entropic costs of binding but also to evaluate possible conformational selection or induced fit mechanisms of recognition, which in turn provide valuable insights for rational drug design. In this sense, we have generated torsional parameters for chalcones and flavonoids that yielded good agreement with QM calculated torsional profiles. Moreover, atomic partial charges were calibrated for chalcone and flavonoid scaffolds, as well as for common substituents, using experimental thermodynamic properties from organic liquids as targets. Topologies built with such parameters yielded good agreement with interproton NMR data during molecular dynamic simulations, which reinforces the accuracy and reliability of our parameters to allow conformational and energetical studies of chalcones and flavonoids. With these results in hand, the conformational sampling of compounds 1−7 was evaluated during microsecond classic molecular dynamics simulations, which allowed the identi- F DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 4. Distribution of the dihedral angles within the main chalcone structure during microsecond MD simulations. ficaton of their main conformational states in water and organic solution, which were compiled in Tables S3 and S4. For compound 1, there are two main conformational states in both solvents (Figure 9A and Figure 9B). The most abundant conformation for this chalcone in CHCl3 is related to a energetical preference of D2 = 0°, as discussed above, yielding an abundance of nearly 80%, while D2 = ±180° accounts for 20%. In water, these abundances are 85% and 15%, respectively. For compound 3, the opposite behavior is presented on both solvents when compared to molecule 1. Aside from the ±90° angle for D1, dihedral D2 in compound 3 presented a preference for ±180°, with an abundance of nearly 80% in both organic and water solution. On compound 4, there are different behaviors for each solvent. In DMSO, three conformational states were found, while four conformations were identified in water (Figure 9A and Figure 9B), although the main conformations found in both solvents were equivalent (with abundances of 35% and 23%, respectively). Moreover, the second population in DMSO (D1 = 140°, D1 = 0°, D1 = 0°) with an abundance of 32% represents only 9% of the populations in water, the third most abundant conformation. The second most abundant population in water is stabilized by a water mediated H-bond between the sugar moiety and the hydroxyl in chalcone ring B (data not shown), thus explaining the 15% abundance. The glycosidic linkage showed the same value for all conformations (Table S3), according to previous analyses of ϕ and ψ angles, demonstrating only one abundant conformational state (Figura S6). It is important to notice that while the conformational profile of compound 4 described here accounts for nearly 80% of the total conformations obtained by MD simulations in DMSO, we were able to identify only 47% of the conformations obtained in water solvent (Table S3). These results suggest that the addition of sugar moieties increases the flexibility of chalcones, especially when capable of intramolecular interactions, by stratifying the major identifiable conformations. In general, flavonoids presented the same conformational populations on both solvents (Figure 10A and Figure 10B), and the combinations of the most common conformational populations were compiled in Table S4. Compound 2 presented only one conformational state in solution with dihedral D = ±30°, with rapid interconversion and low free- energy barrier between these conformational states. On the other hand, the most common conformational populations for compounds 5 and 6 could be identified at D = ±150°, with nearly 50% for each dihedral angle. As previously seen, the methoxy group on the rings of molecules 5 and 6 influenced these conformational profiles, explaining these new configurations when compared to compound 2. Despite the main dihedral angle on compound 7 showing the same conformational states for all conformations (Table S4), the analyses of the relative abundances of each dihedral indicated two conformations for ψ1 angle, between ±90° (Table S4). The preference for ψ1 = 90° is related to a transient intramolecular H-bond between the sugar moieties (data not shown). These data are in accordance with previous analysis (Figure S8) that showed two conformational states for this dihedral angle. The conformational sampling obtained here was also capable of describing the conversion between different conformational states. So in the context of conformational selection recognition mechanism for small ligands, such strategy may represent a useful methodology to contribute in the choice of G DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 5. Distribution of the main dihedral angle associated with the linkage between the rings on flavonoids structures during microsecond MD simulations. ligands conformations for future studies, such as 3D-QSAR and docking calculations. In the context of induced fit recognition mechanism, the previous knowledge of free-energy torsional surfaces can provide quantitative data to the energetic cost of fitting a given pharmacophoric region by twisting scaffold main dihedrals. ■ CONCLUSIONS In the current study, a new parameter set for force field calculations of chalcones and flavonoids was presented, in which we included new torsional potentials within GROMOS force field, as well as a set of atomic partial charges. The major advantage over previously proposed modifications of the force field is that this approach is still compatible with the general GROMOS parameter set for other classes of biomolecules, allowing prompt simulations of ligand−receptor complexes. The addition of new torsional potentials is a similar approach to that performed for improving GROMOS parameters for proteins,54,55 carbohydrates,56 and aromatic rings commonly used in drug design,25 which allows a state-of-art description of conformational profile of ligands. The generated parameters for the description of small molecules reproduce well QM and experimental data, while microsecond MD simulations of complete chalcones and flavonoids were capable of reproduct- ing experimental interproton NOE contacts, suggesting a precise conformational characterization for these molecules. This allowed us to evaluate the energetic impact of common ring substitutions on the free-energy torsional profile of each main dihedral on chalcones and flavonoidic scaffold, as well as the effect of solvent substitution on such energies. Moreover, we were able to identify the most common conformational populations of these molecules in both organic solvent and water, providing quantitative information for medicinal chemists in rational drug design efforts. This set of parameters and the conformational sampling of chalcones and flavonoids are expected to contribute in future studies, supplying accurate results through MD simulations. ■ EXPERIMENTAL SECTION Derivation of New Torsional Parameters. The QM torsional profiles of dihedrals within the structures were obtained using Gaussian 03.57 These QM calculations were carried out using the scan routine combined with a tight convergence criterion at MP2 level with the 6-31G* basis set, obtaining the relative energy associated with the rotation of each dihedral by increments of 30°. The potential energy term associated with the torsion around a dihedral angle m in MM calculations is described by H DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 6. Free energy profiles obtained from metadynamics simulations for the dihedral angles 1 and 2 on fragments 10, 11, and 12, related to the main structure of the chalcone scaffold. the following equation, where ϕm is the dihedral angle value, nm is the multiplicity of the term, δm the associated phase shift, and kϕ,m the corresponding force constant: Vϕ,m = kϕ,m[1 + cos δm cos(nmϕm)] (1) Hence, MM calculations were performed in GROMACS 5.0.7 for every dihedral angle evaluated by previous QM methods, evaluating the total potential energy related to the conformation, including 1,4 nonbonded interactions.56,58 Both QM and MM torsional energies were then submitted to the Rotational Profiler server,59 which calculates the energy gap between both profiles and provides proper MM torsional parameters fitted to the QM. These new parameters were then properly implemented into the topologies for MD simulations. Parametrization Strategy and Topology Construction. In order to describe chalcones and flavonoids through molecular mechanics techniques, a set of aromatic rings with substituents commonly found in chalcones and flavonoids was selected as building blocks. The parametrization strategy was based on accurately reproduce experimental values for physicochemical properties of organic liquids. Topologies were constructed for the fragments using the potentials for bond stretching, bond-angle bending, and improper dihedral deformation, as well as van der Waals interactions terms retrieved directly from GROMOS53A627 set. In order to obtain atomic partial charges, QM calculations were performed using Gaussian 09,57 at the second-order Møller−Plesset perturbation (MP2)60 level with the 6-31G* basis set, in implicit PCM (polarizable continuum model) solvent,61 followed by a RESP fitting.62 Charge adjustments were made I DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 7. Free energy profiles obtained from metadynamics simulations for the dihedral angle 3 on compounds 1, 3, and 4. to properly reproduce the experimental properties in MM conditions, taking care to maintain the dipole moment direction obtained from QM calculations, using an in-house tool based on least-squares fit solution (available in Supporting Information). All of the MD simulations and analyses were performed using the GROMACS simulation suite,63 version 5.0.7.64 In order to derive the charge for the entire chalcone or flavonoid molecule, larger fragments were used. Atomic group charges of the common substituents previously calculated were used and, in the case of overlapping group charges, adjustments were carried out in order to maintain the total dipole moment of the fragment. Therefore, entire molecules were built by adding these fragments as building blocks per se, allowing us to describe differently substituted chalcones and flavonoids. Least-Squares Fit Solution. The adjustment of charges while keeping the total dipole moment of a molecule was modeled as a linear least-squares problem with bounds on the variables65,66 and solved using the SciPy67 library from the Python 2.7 programming language. In this scheme, x, y, and z atomic coordinates are obtained from a SYBYL MOL2 format file generated after a MP2/6- 31G* calculation, along with their respective partial charges derived from a RESP fitting. Thus, in this modeling, all n atoms form a matrix A of atomic positions, with anx being the x coordinate of atom an and so on. A = ÄÇÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅaaa111xzy a2x a2y a2z ... ... ... anx any anz ÉÖÑÑÑÑÑÑÑÑÑÑÑÑÑÑÑÑÑ Cref is the vector of partial charges obtained from QM calculations, and rn is the reference charge of atom an. Cref = [r1 r2 ... rn]T K is the dipole moment from QM calculations charges and · is the dot product. A·Cref = K L is the vector of lower bound values for the new charges, with ln being the lower bound of the new charge of atom an, J DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 8. Free energy profiles obtained from metadynamics simulations for the main dihedral angle of compounds 2, 5, 6, and 7. while U is the vector of upper bound values for the new charges and un is the upper bound of the new charge of atom an. L = [l1 l2 ... ln]T U = [u1 u2 ... un]T Q is the coefficient vector used to change the magnitude of the dipole moment K. Q = [q1 q2 ... qn]T From this, it is intended to find the vector of new charges C, such as cn is the new charge of atom an, using the linear leastsquares method. K DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Figure 9. Abundance of different conformations of chalcones during microsecond MD simulations performed in organic solvents (A) and water (B) for compounds 1, 3, and 4. Figure 10. Abundance of different conformations of flavonoids during microsecond MD simulations performed in organic solvents (A) and water (B) for compounds 2, 5, 6, and 7. C = [c1 c2 ... cn]T Vector C is the solution of the system A·C = K◦Q, with restraints ∑in= 1 ci = m and ∀ c ∈ C, lc ≤ c < uc, in which ◦ is the Hadamard product (element-wise multiplication) and m is the total charge of the molecule. This ensures that the new set of charges C maintains the original dipole moment direction and total charge of the molecule from Cref while also respecting the lower and upper bounds and altering the magnitude of the new dipole moment. The main advantage of this approach is that it allows a combination of the transferability principle of calibrated atomic charge groups while respecting the dipole moment direction obtained from QM calculations. Liquid- and Gas-Phase Simulations for Assessment of Thermodynamic Properties. Physicochemical properties of organic liquids (density and enthalpy of vaporization) were used as target to validate our topologies, as previous works of parametrization of small biomolecules24,26,32 and benchmark of force fields.25,35 The protocol described in Horta et al.24 was applied to all building blocks containing functional groups necessary to the assembly of complex chalcones and flavonoids. These fragments were chosen considering the availability of experimental values of density and enthalpy of vaporization, and the topologies were accepted as useful when the absolute error between experimental and simulated properties properties was below 15%. In order to calculate thermodynamic properties of organics liquids, a condensed phase was induced by simulating a 125 molecules under 100 bar. The box was scaled 2 × 2 × 2 in order to obtain 1000 molecules in liquid phase. All simulations were carried out with Berendsen pressure and temperature coupling algorithms,68 using τT = 0.2 ps and τP = 0.5 ps, along with reaction-field method to compute electrostatic interactions69,70 using ϵRF as the experimental dielectric constant,24,27 while the experimental isothermal compressibility was used as an additional parameter when available.24,27 Otherwise, the compressibility of the most chemically similar molecule was used. While liquid-phase simulations were carried out for 10 ns using leapf rog algorithm, gas-phase simulations were performed using stochastic dynamics algorithm71 to simulate a single molecule in vacuum for 100 ns. LINCS algorithm was applied to constrain all bonds. The potential energies associated with these systems (Epot(g) for gas-phase and Epot(l) for liquid-phase) were extracted and used to calculate (eq 2) the enthalpy of vaporization (ΔHvap) of the fragments. L DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article ΔHvap = (Epot(g) + kBT) − Epot(l) (2) Organic liquid densities (ρ) were calculated from liquidphase simulations using block averages of five blocks, as for ΔHvap. MD simulations were carried out by means of the GROMACS 5.0.7 package, and all the analyses employed dedicated tools from the GROMACS package, associated with in-house scripts to calculate thermodynamic properties. Metadynamics Simulations. For the structural assessment of complete chalcones and flavonoids, metadynamics simulations were performed in order to determine the conformational preferences of dihedral angles in the main scaffold of these compounds and the associated carbohydrate moieties. Several fragments containing the dihedrals of interest were constructed and simulated during 50 ns, at 298 K and in nonaqueous solvents (chloroform or DMSO, to reproduce the conditions of the NMR experiments concerning the chosen complete chalcones and flavonoids) and water, as a control, in cubic boxes using periodic boundary conditions. The systems were submitted to energy minimization by steepest descents algorithm, followed by an equilibration phase of 2 ns and subsequently to well-tempered (WT) metadynamics simulations. Gaussian hills with an initial height of 1.2 kcal·mol−1 were applied, along with a hill width of 0.35 radians. In this WT scheme, Gaussian functions were rescaled employing a bias factor of 10. Pressure was kept constant at 1 atm by a Parrinello−Rahman barostat,72,73 with a 2.0 ps coupling constant, and temperature was kept constant by a V-rescale thermostat (NVT step), with a coupling constant of τ = 0.1. The Lincs method74,75 was applied to constrain covalent bond lengths, allowing an integration step of 2 fs. For the systems solvated with DMSO, all bond lengths were constrained using the SHAKE algorithm.76,77 The reaction-field method69,70 was applied in the calculation of electrostatic interactions. The GROMACS 4.6.1 interfaced with the PLUMED plugin package 2.0b178 was used. As for the free energy surfaces, the sum hills tool from PLUMED package was applied. Error estimates were calculated using the block-analysis technique, while the reweighting procedure was performed based on the work of Branduardi et al.79 NOE Contacts Assessment in MD Simulations. The complete structure of chalcones and flavonoids was submitted to microsecond MD simulations in organic solvents (chloroform or DMSO) and water. The MD conditions were generally the same as the metadynamics calculations, with longer equilibration (20 ns) and production phases (1000 ns). MD simulations were carried out by means of the GROMACS 5.0.7 package, and all the analyses employed dedicated tools from the GROMACS package, associated with in-house scripts. To allow a comparison of the simulations to H NMR data (NOESY signals) of the compounds, nonpolar hydrogens atoms were added to frames retrieved from trajectories, using PyMol.80 The obtained models were used to calculate the average interproton distances from simulations, using the gmx mindist tool from GROMACS. The average distances were computed as ⟨r−6⟩−1/6, respecting the NOE intensity for small molecules. Identification of Conformational Populations. Considering the dihedral angles of a molecule throughout a MD simulation, a conformational population is a set of conformations that share similar values for their respective dihedral angles. In order to determine these conformational populations (that is, to measure if structures share dihedral angles values close enough to be grouped together), the following procedure was implemented: (1) The value of each dihedral angle was measured for each simulation time step, as well as the distribution of the angle (how much of the total simulated time was spent in each angle value). These distributions were smoothed using a sliding window of length 21° using the Hann function,81 obtaining a curve with well behaved gradient. (2) From this distribution, “peaks” and “valleys” were identified. A peak is defined as an angle with maximum local value, that is, the distribution of that angle is larger than the distribution of its immediate neighbors. Analogously, a valley is an angle with minimum local value, or angles with distribution below a given threshold that indicates a distribution value so low that the angle should be considered spurious. Knowing the peaks and valleys, dihedral populations of each torsional bond were identified by the peak angle between two valleys, corresponding to a region of high distribution. The conformational populations of a molecule were then characterized by combining the populations of each single dihedral angle (identified by the peak values) that occur at the same time step, building a tuple of n peaks, n being the number of torsional bonds. Thus, all conformations identified by the same tuple of dihedral values belong to the same conformational population, the number of conformations that receive the same tuple determines the relative abundance of the conformational population, and the number of different tuples is the number of different populations throughout a simulation. ■ ASSOCIATED CONTENT *S Supporting Information The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcb.8b10139. Conformational data regarding the sugar moieties during metadynamics calculations, conformational data regarding the compounds on refinement MD simulations, conformational data regarding the sugar moieties on complete compounds, interproton distances of the compounds simulated in water (PDF) (ZIP) (ZIP) ■ AUTHOR INFORMATION Corresponding Author *E-mail: hverli@cbiot.ufrgs.br. Phone: +55 (51) 3308-7770. Fax: +55 (51) 3308-7309. ORCID Marcio Dorn: 0000-0001-8534-3480 Hugo Verli: 0000-0002-4796-8620 Notes The authors declare no competing financial interest. ■ ACKNOWLEDGMENTS This research received funding by the Conselho Nacional de Desenvolvimento Cientifí co e Tecnológico (CNPq), the Coordenação de Aperfeiçoamento de Pessoal de Niv́ el Superior (CAPES), and the Research supported by the Centro Nacional de Supercomputaca̧ õ of the Universidade Federal do Rio Grande do Sul (CESUP/UFRGS). This work was supported by grants from FAPERGS [Grant 16/25510000520-6], MCT/CNPq [Grant 311022/2015-4], CAPESSTIC AMSUD [Grant 88887.135130/2017-01], Brazil, CAPES/Drug Discovery Grant 23038.007777/2014-87, and M DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article Alexander von Humboldt-Stiftung (AvH) [Grant BRA 1190826 HFSTCAPES-P], Germany. ■ REFERENCES (1) Harvey, A. L. Natural products in drug discovery. Drug Discovery Today 2008, 13, 894−901. (2) Cragg, G. M.; Newman, D. J. Natural products: A continuing source of novel drug leads. Biochim. Biophys. Acta, Gen. Subj. 2013, 1830, 3670−3695. (3) Fabricant, D. S.; Farnsworth, N. R. The value of plants used in traditional medicine for drug discovery. Environ. Health Perspect. 2001, 109, 69−75. (4) Yang, Z.; Wu, W.; Wang, J.; Liu, L.; Li, L.; Yang, J.; Wang, G.; Cao, D.; Zhang, R.; Tang, M.; et al. Synthesis and biological evaluation of novel millepachine derivatives as a new class of tubulin polymerization inhibitors. J. Med. Chem. 2014, 57, 7977−7989. (5) Stoll, R.; Renner, C.; Hansen, S.; Palme, S.; Klein, C.; Belling, A.; Zeslawski, W.; Kamionka, M.; Rehm, T.; Muhlhahn, P.; et al. Chalcone derivatives antagonize interactions between the human oncoprotein MDM2 and p53. Biochemistry 2001, 40, 336−344. (6) Wang, H. M.; Zhang, L.; Liu, J.; Yang, Z. L.; Zhao, H. Y.; Yang, Y.; Shen, D.; Lu, K.; Fan, Z. C.; Yao, Q. W.; et al. Synthesis and anticancer activity evaluation of novel prenylated and geranylated chalcone natural products and their analogs. Eur. J. Med. Chem. 2015, 92, 439−448. (7) Duarte, J.; Peŕ ez-Palencia, R.; Vargas, F.; Ocete, M. A.; Peŕ ezVizcaino, F.; Zarzuelo, A.; Tamargo, J. Antihypertensive effects of the flavonoid quercetin in spontaneously hypertensive rats. Br. J. Pharmacol. 2001, 133, 117−124. (8) Seufi, A. M.; Ibrahim, S. S.; Elmaghraby, T. K.; Hafez, E. E. Preventive effect of the flavonoid, quercetin, on hepatic cancer in rats via oxidant/antioxidant activity: molecular and histological evidences. J. Exp. Clin. Cancer Res. 2009, 28, 80. (9) Loṕ ez, S. N.; Castelli, M. V.; Zacchino, S. a.; Domínguez, J. N.; Lobo, G.; Charris-Charris, J.; Corteś , J. C.; Ribas, J. C.; Devia, C.; Rodríguez, a. M.; et al. In vitro antifungal evaluation and structureactivity relationships of a new series of chalcone derivatives and synthetic analogues, with inhibitory properties against polymers of the fungal cell wall. Bioorg. Med. Chem. 2001, 9, 1999−2013. (10) Liu, H.-r.; Liu, X.-j.; Fan, H.-q.; Tang, J.-j.; Gao, X.-h.; Liu, W.K. Design, synthesis and pharmacological evaluation of chalcone derivatives as acetylcholinesterase inhibitors. Bioorg. Med. Chem. 2014, 22, 6124−33. (11) Niu, Y.; Zhu, H.; Liu, J.; Fan, H.; Sun, L.; Lu, W.; Liu, X.; Li, L. 3,5,2′,4′-Tetrahydroxychalcone, a new non-purine xanthine oxidase inhibitor. Chem.-Biol. Interact. 2011, 189, 161−166. (12) Uriarte-Pueyo, I.; Calvo, M. I. Flavonoids as acetylcholinesterase inhibitors. Curr. Med. Chem. 2011, 18, 5289−5302. (13) Zhuang, C.; Zhang, W.; Sheng, C.; Zhang, W.; Xing, C.; Miao, Z. Chalcone: A Privileged Structure in Medicinal Chemistry. Chem. Rev. 2017, 117, 7762−7810. (14) Santos-Buelga, C., Escribano-Bailon, M. T., Lattanzio, V., Eds. Recent Advances in Polyphenol Research; Wiley-Blackwell, 2010; Vol. 2, pp 1−332 (15) Singh, P.; Anand, A.; Kumar, V. Recent developments in biological activities of chalcones: A mini review. Eur. J. Med. Chem. 2014, 85, 758−777. (16) Cazarolli, L. H.; Zanatta, L.; Alberton, E. H.; Figueiredo, M. S. R. B.; Folador, P.; Damazio, R. G.; Pizzolatti, M. G.; Silva, F. R. M. B. Flavonoids: prospective drug candidates. Mini-Rev. Med. Chem. 2008, 8, 1429−1440. (17) Jorgensen, W. L. The many roles of computation in drug discovery. Science (Washington, DC, U. S.) 2004, 303, 1813−8. (18) Sliwoski, G.; Kothiwale, S.; Meiler, J.; Lowe, E. W. Computational methods in drug discovery. Pharmacol. Rev. 2014, 66, 334−95. (19) Durrant, J. D.; McCammon, J. A. Molecular dynamics simulations and drug discovery. BMC Biol. 2011, 9, 71. (20) De Vivo, M.; Masetti, M.; Bottegoni, G.; Cavalli, A. Role of molecular dynamics and related methods in drug discovery. J. Med. Chem. 2016, 59, 4035−4061. (21) van Gunsteren, W. F.; Bakowies, D.; Baron, R.; Chandrasekhar, I.; Christen, M.; Daura, X.; Gee, P.; Geerke, D. P.; Glaẗ tli, A.; Hünenberger, P. H.; et al. Biomolecular modeling: goals, problems, perspectives. Angew. Chem., Int. Ed. 2006, 45, 4064−4092. (22) Cunha, R.; Soares, T.; Husu, V.; Pontes, F.; Franca, E.; Lins, R. The Complex World of Polysaccharides; InTech, 2012; Chapter 9, pp 229−256. (23) Lemkul, J. A.; Allen, W. J.; Bevan, D. R. Practical considerations for building GROMOS-compatible small-molecule topologies. J. Chem. Inf. Model. 2010, 50, 2221−2235. (24) Horta, B. A. C.; Merz, P. T.; Fuchs, P. F. J.; Dolenc, J.; Riniker, S.; Hünenberger, P. H. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. J. Chem. Theory Comput. 2016, 12, 3825−3850. (25) Polet̂ o, M. D.; Rusu, V. H.; Grisci, B. I.; Dorn, M.; Lins, R. D.; Verli, H. Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities. Front. Pharmacol. 2018, 9, 395. (26) Tesch, R.; Becker, C.; Müller, M. P.; Beck, M. E.; Quambusch, L.; Getlik, M.; Lategahn, J.; Uhlenbrock, N.; Costa, F. N.; Polet̂ o, M. D. An Unusual Intramolecular Halogen Bond Guides Conformational Selection. Angew. Chem., Int. Ed. 2018, 57, 9970−9975. (27) Oostenbrink, C.; Villa, A.; Mark, A. E.; Van Gunsteren, W. F. A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656−1676. (28) Riniker, S. Fixed-Charge Atomistic Force Fields for Molecular Dynamics Simulations in the Condensed Phase: An Overview. J. Chem. Inf. Model. 2018, 58, 565−578. (29) Stroet, M.; Caron, B.; Visscher, K. M.; Geerke, D. P.; Malde, A. K.; Mark, A. E. Automated Topology Builder Version 3.0: Prediction of Solvation Free Enthalpies in Water and Hexane. J. Chem. Theory Comput. 2018, 14, 5834−5845. (30) Koziara, K. B.; Stroet, M.; Malde, A. K.; Mark, A. E. Testing and validation of the Automated Topology Builder (ATB) version 2.0: prediction of hydration free enthalpies. J. Comput.-Aided Mol. Des. 2014, 28, 221−233. (31) Malde, A. K.; Zuo, L.; Breeze, M.; Stroet, M.; Poger, D.; Nair, P. C.; Oostenbrink, C.; Mark, A. E. An Automated Force Field Topology Builder (ATB) and Repository: Version 1.0. J. Chem. Theory Comput. 2011, 7, 4026−4037. (32) Pedebos, C.; Pol-Fachin, L.; Verli, H. Unrestrained conforma- tional characterization of Stenocereus eruca saponins in aqueous and nonaqueous solvents. J. Nat. Prod. 2012, 75, 1196−1200. (33) Micaelo, N. M.; Baptista, A. M.; Soares, C. M. Parametrization of 1-butyl-3-methylimidazolium hexafluorophosphate/nitrate ionic liquid for the GROMOS force field. J. Phys. Chem. B 2006, 110, 14444−14451. (34) Horta, B. A. C.; Fuchs, P. F. J.; Van Gunsteren, W. F.; Hunenberger, P. H. New interaction parameters for oxygen compounds in the GROMOS force field: Improved pure-liquid and solvation properties for alcohols, ethers, aldehydes, ketones, carboxylic acids, and esters. J. Chem. Theory Comput. 2011, 7, 1016−1031. (35) Caleman, C.; van Maaren, P. J.; Hong, M.; Hub, J. S.; Costa, L. T.; van der Spoel, D. Force Field Benchmark of Organic Liquids: Density, Enthalpy of Vaporization, Heat Capacities, Surface Tension, Isothermal Compressibility, Volumetric Expansion Coefficient, and Dielectric Constant. J. Chem. Theory Comput. 2012, 8, 61−74. (36) Haynes, W., Ed. Handbook of Chemistry and Physics; CRC, 2014; p 2704. (37) Riddick, J.; Bunger, W.; Sakano, T. Organic Solvents: Physical Properties and Methods of Purification, 4th ed.; Wiley, 1986. (38) Abraham, M. H.; Whiting, G. S.; Fuchs, R.; Chambers, E. J. Thermodynamics of solute transfer from water to hexadecane. J. Chem. Soc., Perkin Trans. 2 1990, 77, 291. N DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX The Journal of Physical Chemistry B Article (39) Chickos, J. S.; Acree, W. E. Enthalpies of vaporization of organic and organometallic compounds, 1880−2002. J. Phys. Chem. Ref. Data 2003, 32, 519−878. (40) Ding, F.; Peng, W.; Peng, Y.-K. Biophysical exploration of protein-flavonol recognition: effects of molecular properties and conformational flexibility. Phys. Chem. Chem. Phys. 2016, 18, 11959− 11971. (41) Jo, A. R.; Kim, J. H.; Yan, X.-T.; Yang, S. Y.; Kim, Y. H. Soluble epoxide hydrolase inhibitory components from Rheum undulatum and in silico approach. J. Enzyme Inhib. Med. Chem. 2016, 31, 70−78. (42) Untergehrer, M.; Bücherl, D.; Wittmann, H.-J.; Strasser, A.; Heilmann, J.; Jürgenliemk, G. Structure-Dependent Deconjugation of Flavonoid Glucuronides by Human β-Glucuronidase-In Vitro and In Silico Analyses. Planta Med. 2015, 81, 1182−1189. (43) Chinnadurai, R. K.; Saravanaraman, P.; Boopathy, R. Understanding the molecular mechanism of aryl acylamidase activity of acetylcholinesterase−An in silico study. Arch. Biochem. Biophys. 2015, 580, 1−13. (44) Koteswara Rao, Y.; Vimalamma, G.; Venkata Rao, C.; Tzeng, Y. M. Flavonoids and andrographolides from Andrographis paniculata. Phytochemistry 2004, 65, 2317−2321. (45) Reddy, M. K.; Reddy, M. V. B.; Reddy, B. A. K.; Gunasekar, D.; Caux, C.; Bodo, B. A New Chalcone and a Flavone from Andrographis neesiana. Chem. Pharm. Bull. 2003, 51, 854−856. (46) Nørbæk, R.; Nielsen, J. K.; Kondo, T. Flavonoids from flowers of two Crocus chrysanthus-biflorus cultivars: ’Eye-catcher’ and ’Spring Pearl’ (Iridaceae). Phytochemistry 1999, 51, 1139−1146. (47) Jayaprakasam, B.; Gunasekar, D.; Rao, K. V.; Blond, a.; Bodo, B. Androechin, A New Chalcone Glucoside from Andrographis Echioides. J. Asian Nat. Prod. Res. 2001, 3, 43−48. (48) Celebre, G.; De Luca, G.; Longeri, M.; Catalano, D.; Veracini, C. A.; Emsley, J. W. Structure of biphenyl in a nematic liquidcrystalline solvent. J. Chem. Soc., Faraday Trans. 1991, 87, 2623. (49) Jaime, C.; Font, J. Empirical force field calculations (MM2-V4) on biphenyl and 2,2′-bipyridine. J. Mol. Struct. 1989, 195, 103−110. (50) Charbonnier, S.; Beguemsi, S.; N’Guessan, Y.; Legoff, D.; Proutiere, A.; Viani, R. Dihedral angle of biphenyl compounds studied by theoretical calculations (dipole induced dipole, molecular mechanics) and experimental methods (electro-optic measurements, infrared spectroscopy). J. Mol. Struct. 1987, 158, 109−125. (51) Huber, T.; Torda, A. E.; van Gunsteren, W. F. Local elevation: A method for improving the searching properties of molecular dynamics simulation. J. Comput.-Aided Mol. Des. 1994, 8, 695−708. (52) Laio, A.; Parrinello, M. Escaping free-energy minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562−12566. (53) Barducci, A.; Bussi, G.; Parrinello, M. Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method. Phys. Rev. Lett. 2008, 100, 020603. (54) Schmid, N.; Eichenberger, A. P.; Choutko, A.; Riniker, S.; Winger, M.; Mark, A. E.; van Gunsteren, W. F. Definition and testing of the GROMOS force-field versions 54A7 and 54B7. Eur. Biophys. J. 2011, 40, 843−856. (55) Huang, W.; Lin, Z.; van Gunsteren, W. F. Validation of the GROMOS 54A7 Force Field with Respect to β-Peptide Folding. J. Chem. Theory Comput. 2011, 7, 1237−1243. (56) Pol-Fachin, L.; Rusu, V. H.; Verli, H.; Lins, R. D. GROMOS 53A6 GLYC, an improved GROMOS force field for hexopyranosebased carbohydrates. J. Chem. Theory Comput. 2012, 8, 4681−4690. (57) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; et al. Gaussian 03, revision A.01; Gaussian, Inc.: Wallingford, CT, 2004. (58) Pol-Fachin, L.; Verli, H.; Lins, R. D. Extension and validation of the GROMOS 53A6(GLYC) parameter set for glycoproteins. J. Comput. Chem. 2014, 35, 2087−2095. (59) Rusu, V. H.; Baron, R.; Lins, R. D. PITOMBA: Parameter Interface for Oligosaccharide Molecules Based on Atoms. J. Chem. Theory Comput. 2014, 10, 5068−5080. (60) Møller, C.; Plesset, M. S. Note on an Approximation Treatment for Many-Electron Systems. Phys. Rev. 1934, 46, 618−622. (61) Mennucci, B.; Tomasi, J. Continuum solvation models: A new approach to the problem of solute’s charge distribution and cavity boundaries. J. Chem. Phys. 1997, 106, 5151. (62) Bayly, C.; Cieplak, P.; Cornell, W.; Kollman, P. A well-behaved electrostatic potential based method using charge restraints for deriving atomic···. J. Phys. Chem. 1993, 97, 10269−10280. (63) Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J. C. GROMACS: Fast, flexible, and free. J. Comput. Chem. 2005, 26, 1701−1718. (64) Abraham, M. J.; Murtola, T.; Schulz, R.; Pall, S.; Smith, J. C.; Hess, B.; Lindahl, E. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1−2, 19−25. (65) Stark, P. B.; Parker, R. L. Bounded-variable least-squares: an algorithm and applications. Comput. Stat. 1995, 10, 129−129. (66) Branch, M. A.; Coleman, T. F.; Li, Y. A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems. SIAM Journal on Scientific Computing 1999, 21, 1−23. (67) Jones, E.; Oliphant, T.; Peterson, P. SciPy: Open source scientific tools for Python. 2001−. http://www.scipy.org/. (68) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984, 81, 3684−3690. (69) Barker, J.; Watts, R. Monte Carlo studies of the dielectric properties of water-like models. Mol. Phys. 1973, 26, 789−792. (70) Watts, R. Monte Carlo studies of liquid water. Mol. Phys. 1974, 28, 1069−1083. (71) Van Gunsteren, W. F.; Berendsen, H. J. C. A Leap-frog Algorithm for Stochastic Dynamics. Mol. Simul. 1988, 1, 173−185. (72) Parrinello, M.; Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 1981, 52, 7182. (73) Nose,́ S.; Klein, M. L. Constant pressure molecular dynamics for molecular systems. Mol. Phys. 1983, 50, 1055−1076. (74) Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 1997, 18, 1463−1472. (75) Hess, B. P-LINCS: A parallel linear constraint solver for molecular simulation. J. Chem. Theory Comput. 2008, 4, 116−122. (76) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327−341. (77) Geerke, D. P.; van Gunsteren, W. F. Force field evaluation for biomolecular simulation: free enthalpies of solvation of polar and apolar compounds in various solvents. ChemPhysChem 2006, 7, 671− 678. (78) Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New feathers for an old bird. Comput. Phys. Commun. 2014, 185, 604−613. (79) Branduardi, D.; Bussi, G.; Parrinello, M. Metadynamics with Adaptive Gaussians. J. Chem. Theory Comput. 2012, 8, 2247−2254 (PMID: 26588957) . (80) DeLano, W. PyMol: An open-source molecular graphics tool. CCP4 Newsletter On Protein Crystallography; CCP4, 2002; Number 40. (81) Harris, F. J. On the use of windows for harmonic analysis with the discrete Fourier transform. Proc. IEEE 1978, 66, 51−83. O DOI: 10.1021/acs.jpcb.8b10139 J. Phys. Chem. B XXXX, XXX, XXX−XXX Development of GROMOS-Compatible Parameter Set for Simulations of Chalcones and Flavonoids Pablo R. Arantes,† Marcelo D. Polêto,† Elisa B. O. John,† Conrado Pedebos,†,‡ Bruno I. Grisci,¶ Marcio Dorn,¶ and Hugo Verli∗,† †Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil ‡School of Pharmacy, University of Nottingham, University Park, Nottingham, U.K. ¶Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil E-mail: hverli@cbiot.ufrgs.br Phone: +55 (51) 3308-7770. Fax: +55 (51) 3308-7309 Supporting Information List of Figures 1 Comparison of QM (red) and MM (green) dipole moment vectors for each fragment evaluated in this work. . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Distribution of geometries of the dihedral angles 1 and 2 for fragments 10, 11 and 12 during refinement MD simulations. M1 and M2 represent each energy minimum obtained from metadynamics calculations. . . . . . . . . . . . . . . 4 1 3 Distribution of geometries of the dihedral angle 3 for chalcones (1, 3 and 4) and main dihedral angle for flavonoids (2, 5, 6 and 7) during refinement MD simulations. M1 and M2 represent each energy minimum obtained from metadynamics calculations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4 Distribution of geometries of the glycosidic linkages for fragments 13, 14, 15 and 12 during refinement MD simulations. M1 and M2 represent each energy minimum obtained from metadynamics calculations.. . . . . . . . . . . . . . 6 5 Free energy profiles obtained from metadynamics simulations for the glycosidic linkages on fragments 13 and 14. . . . . . . . . . . . . . . . . . . . . . . . . 7 6 Free energy profiles obtained from metadynamics simulations for the glycosidic linkages on fragments 15 and 12. . . . . . . . . . . . . . . . . . . . . . . . . 8 7 Distribution of geometries of the glycosidic linkage for fragment 4 during microsecond MD simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 Distribution of geometries of the glycosidic linkages for compound 7 during microsecond MD simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 List of Tables 1 Torsional parameters obtained based on QM calculations . . . . . . . . . . . 10 2 NOESY contacts of compounds, inter-proton distances derived from microsec- ond MD simulations in water . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Relative Abundance of different conformations of chalcones during microsec- ond MD simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Relative Abundance of different conformations of flavonoids during microsec- ond MD simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Figure 1: Comparison of QM (red) and MM (green) dipole moment vectors for each fragment evaluated in this work. 3 Figure 2: Distribution of geometries of the dihedral angles 1 and 2 for fragments 10, 11 and 12 during refinement MD simulations. M1 and M2 represent each energy minimum obtained from metadynamics calculations. 4 Figure 3: Distribution of geometries of the dihedral angle 3 for chalcones (1, 3 and 4) and main dihedral angle for flavonoids (2, 5, 6 and 7) during refinement MD simulations. M1 and M2 represent each energy minimum obtained from metadynamics calculations. 5 Figure 4: Distribution of geometries of the glycosidic linkages for fragments 13, 14, 15 and 12 during refinement MD simulations. M1 and M2 represent each energy minimum obtained from metadynamics calculations.. 6 Figure 5: Free energy profiles obtained from metadynamics simulations for the glycosidic linkages on fragments 13 and 14. 7 Figure 6: Free energy profiles obtained from metadynamics simulations for the glycosidic linkages on fragments 15 and 12. Figure 7: Distribution of geometries of the glycosidic linkage for fragment 4 during microsecond MD simulations. 8 Figure 8: Distribution of geometries of the glycosidic linkages for compound 7 during microsecond MD simulations. 9 Table 1: Torsional parameters obtained based on QM calculations Compound 1 (Acetophenone) δ k φ,m n 0 11.221 0 0 -12.798 2 2 (Propenal) 0 13.893 0 3.361 0 -15.036 0 1 2 3 (Propanal) 0 5.24 0 -1.924 0 -2.637 0 1 3 4 (Ethenylbenzene) 0 4.687 0 0 -6.644 2 5 (Ethylbenzene) 0 0.282 0 6 (Phenol) 0 7.477 0 0 -9.734 2 7 (Methoxybenzene) 0 5.773 0 0 -11.482 2 8 (Benzaldehyde) 0 15.107 0 0 -15.653 2 9 (6-phenyl-2,3-dihydropyran-4-one) 0 0 5.068 -9.195 0 2 10 Table 2: NOESY contacts of compounds, inter-proton distances derived from microsecond MD simulations in water Compound 5 NOE Av. Distance (Å) 1 4.11 ± 0.26 2 2.37 ± 0.83 3 2.86 ± 0.75 4* 5.08 ± 0.61 5* 3.52 ± 0.57 Compound 6 NOE Av. Distance (Å) 1 2.52 ± 0.75 2 4.28 ± 0.34 3 2.77 ± 0.74 4 4.06 ± 0.42 5* 4.96 ± 0.73 6* 3.62 ± 0.39 NOE 1 2* 3* Compound 7 Av. distance (Å) 2.27 ± 0.89 2.68 ± 0.26 4.38 ± 0.53 Compound 3 NOE Av. Distance (Å) 1 4.39 ± 0.08 2 2.30 ± 0.85 3 2.47 ± 0.82 4 2.30 ± 0.84 5 2.54 ± 0.77 6* 2.85 ± 0.80 Compound 4 NOE Av. Distance (Å) 1 2.35 ± 0.84 2 4.02 ± 0.39 3* 2.37 ± 0.26 4 2.24 ± 0.25 *NOESY contacts that could be above 5 Åduring the MD simulations. The average distances were computed as r−6 −1/6, respecting the NOE intensity for small molecules. 11 Table 3: Relative Abundance of different conformations of chalcones during microsecond MD simulations Compound 1 3 4 Condition MD in CHCl3 MD in Water MD in CHCl3 MD in Water MD in DMSO MD in Water Dihedral 1 (Degrees) 0 0 0 0 90 90 90 90 -140 140 -140 -140 -140 130 -140 Dihedral 2 (Degrees) 0 -180 0 -180 180 0 180 0 0 0 0 0 -180 0 0 Dihedral 3 (Degrees) 180 180 180 180 180 180 180 180 0 0 -140 0 0 0 140 φ (Degrees) -60 -60 -60 -60 -60 -60 -60 ψ (Degrees) -60 -60 -60 -60 -60 -60 -60 Abundance (%) 78 22 85 15 78 22 78 22 35 32 12 23 15 9 9 Table 4: Relative Abundance of different conformations of flavonoids during microsecond MD simulations Compound 2 5 6 7 Condition MD in CHCl3 MD in Water MD in CHCl3 MD in Water MD in CHCl3 MD in Water MD in DMSO MD in Water Dihedral Angle (Degrees) ±30 ±30 150 -150 150 -150 -150 150 -150 150 -150 -150 -150 -150 φ1 (Degrees) -70 -70 -70 -70 ψ1 (Degrees) 90 -90 90 -90 φ2 (Degrees) -100 -100 -100 -100 ψ2 (Degrees) -80 -80 -80 -80 Abundance (%) 99 99 50 45 48 42 51 49 51 49 54 41 61 35 12 Capítulo 5. Resultados 110 5.3 Capítulo III O conhecimento acumulado sobre a parametrização e caracterização conformacional de pequenos ligantes em solução obtidos ao longo do desenvolvimento dos Capítulos I e II nos permitiu aplicar nossas estratégias metodológicas na caracterização estrutural do ligante sintético PIK-75, um inibidor da cinase glicogênio sintase (GSK-3β), demonstrando o impacto real da abordagem analítica desenvolvida nessa tese. Em parceria com a Universidade Federal do Rio de Janeiro (UFRJ), a Universidade Federal do ABC (UFABC) e a Universidade de Dortmund (Alemanha), a estrutura cristalográfica inédita da GSK-3β complexada com PIK-75 foi resolvida com resolução de 2,6 Å. A parametrização dos termos torcionais e das cargas atômicas parciais do ligante PIK-75 foram realizadas como descritas nas seções 4.4 e 4.6, respectivamente. O ligante foi então simulado em solvente aquoso por 1,0 µs para amostragem conformacional. Nossos resultados demonstraram a existência de 2 principais conformações do PIK-75 livre em solução aquosa, com frequências de 46% e 54%. Durante a simulação, foi possível confirmar que o ligante PIK-75 assumiu conformações muito similares à sua conformação complexada à GSK-3β, e que essa conformação é muito similar à população conformacional minoritária em solução. Nossos resultados indicam a ocorrência de um processo de seleção conformacional na cinética de reconhecimento molecular do ligante pelo seu receptor-alvo. Communications Angewandte Chemie Halogen Bonds International Edition: DOI: 10.1002/anie.201804917 German Edition: DOI: 10.1002/ange.201804917 An Unusual Intramolecular Halogen Bond Guides Conformational Selection Roberta Tesch, Christian Becker, Matthias Philipp Müller, Michael Edmund Beck, Lena Quambusch, Matthäus Getlik, Jonas Lategahn, Niklas Uhlenbrock, Fanny Nascimento Costa, Marcelo D. PolÞto, Pedro de Sena Murteira Pinheiro, Daniel Alencar Rodrigues, Carlos Mauricio R. Sant’Anna, Fabio Furlan Ferreira, Hugo Verli, Carlos Alberto Manssour Fraga,* and Daniel Rauh* Abstract: PIK-75 is a phosphoinositide-3-kinase (PI3K) aisoform-selective inhibitor with high potency. Although published structure–activity relationship data show the importance of the NO2 and the Br substituents in PIK-75, none of the published studies could correctly determine the underlying reason for their importance. In this publication, we report the first X-ray crystal structure of PIK-75 in complex with the kinase GSK-3b. The structure shows an unusual U-shaped conformation of PIK-75 within the active site of GSK-3b that is likely stabilized by an atypical intramolecular Br···NO2 halogen bond. NMR and MD simulations show that this conformation presumably also exists in solution and leads to a binding-competent preorganization of the PIK-75 molecule, thus explaining its high potency. We therefore suggest that the site-specific incorporation of halogen bonds could be generally used to design conformationally restricted bioactive substances with increased potencies. The selective inhibition of kinases involved in a variety of signaling cascades has attracted increasing attention over the last 20 years. Major efforts undertaken by the pharmaceutical industry and academic researchers have led to highly potent inhibitors that are used clinically and have increased our understanding of the molecular mechanisms of kinases involved in different diseases.[1] The small molecule PIK-75 (Figure 1) is one example of a small-molecule inhibitor with high potency and selectivity toward the a-isoform of the catalytic subunit of phosphoinositide-3-kinases (PI3Ks) and glycogen synthase kinase-3b (GSK-3b, IC50 = 10 nm),[2] but also inhibits several other kinases (data from a kinase profiling are provided in Supplementary Table 1 in the Supporting Information). PI3Ks constitute a family of kinases involved in the production of the second messenger phosphatidylinositol-3,4,5-triphosphate and mutations have been shown to be important in the development of cancer and other diseases,[3] thus making available inhibitors attractive targets for structure-guided improvement of their potency and selectivity. Several studies have attempted to predict the binding mode of PIK-75 within the active site of kinases, but they have led to different conclusions[2c,4] (and none has correctly predicted the conformation of PIK-75 within the active site as outlined below). In this publication, we present the first cocrystal structure of PIK-75 in complex with GSK-3b, a kinase acting downstream of PI3Ks that phosphorylates many intracellular targets[3] and has been studied as a potential target for treatment of, for example, type II diabetes and Alzheimer disease.[5] We show that PIK-75 adopts a conformation that is stabilized by an intramolecular halogen bond of a type that has not been described before. MD simulations show that this binding-competent conformation is also present in solution, thus explaining the high potency of the inhibitor. In order to gain insight into the structural interplay within the kinase domain of GSK3, we co-crystallized PIK-75 in complex with GSK-3b (residues 26–393; PDB ID 6GN1). A dataset was collected from a single crystal and the structure solved by molecular replacement (resolution 2.6 Š, Rwork = 22.2 %, Rfree = 25.9 %, Supplementary Table 2). The asym- [*] Dr. R. Tesch, Dr. C. Becker, Dr. M. P. Müller, M. Sc. L. Quambusch, Dr. M. Getlik, M. Sc. J. Lategahn, M. Sc. N. Uhlenbrock, Prof. D. Rauh Faculty of Chemistry and Chemical Biology TU Dortmund University Otto-Hahn-Strasse 4a, 44227 Dortmund (Germany) E-mail: daniel.rauh@tu-dortmund.de Dr. R. Tesch, M. Sc. P. d. S. M. Pinheiro, M. Sc. D. A. Rodrigues, Prof. C. M. R. Sant’Anna, Prof. C. A. M. Fraga Laboratório de Avaliażo e Síntese de Substâncias Bioativas (LASSBio), Instituto de CiÞncias BiomØdicas Universidade Federal do Rio de Janeiro Av. Carlos Chagas Filho, 373, CEP 21941-902, Rio de Janeiro (Brazil) E-mail: cmfraga@ccsdecania.ufrj.br Dr. M. E. Beck Bayer AG, division Crop Science Alfred-Nobel-Strasse 50, 40789 Monheim am Rhein (Germany) Dr. F. N. Costa, Prof. F. F. Ferreira Centro de CiÞncias Naturais e Humanas Universidade Federal do ABC S¼o Paulo (Brazil) M. Sc. M. D. PolÞto, Prof. H. Verli Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul Av. Bento GonÅalves, 9500, Porto Alegre (Brazil) Prof. C. M. R. Sant’Anna Departamento de Química, Instituto de CiÞncias Exatas Universidade Federal Rural do Rio de Janeiro SeropØdica (Brazil) Supporting information (including experimental details) and the ORCID identification number(s) for the author(s) of this article can be found under: https ://doi.org/10.1002/anie.201804917. 9970  2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2018, 57, 9970 –9975 Communications Angewandte Chemie Figure 1. Structure of PIK-75 in complex with the kinase GSK-3b. (A) Overview of GSK-3b (amino acids 26–393, uniprot ID P49841, PDB ID 6GN1). The hinge region (orange), glycine-rich loop (blue), aC-helix (red), and the activation loop (green) as well as PIK-75 (yellow) are highlighted. (B) The major interactions of PIK-75 with Phe 67, Lys 85, and Val 135 are indicated (the QR code can be used to visualize the structure in augmented reality[6]). (C) Chemical structure of PIK-75. (D) Stereoview of PIK-75, the FOÀFC simulated annealing omit map (s = 3.0, green) as well as the anomalous map (s = 5.0, magenta; diffraction data was collected close to the Br absorption edge at l = 0.91883 Š) are shown. metric unit consists of two molecules of GSK-3b, both of which are phosphorylated at Tyr 216 and show a very similar overall conformation (one molecule is shown in Figure 1 A). Additional electron density observed within the active site could be clearly and unambiguously modeled with PIK-75. The correct orientation of PIK-75 (Figure 1 C) was furthermore verified by the strong anomalous signal resulting from the Br atom present in the inhibitor (Figure 1 D). Interactions were observed between one of the PIK-75 imidazo[1,2-a]pyridine nitrogen atoms and the GSK-3b Val 135 backbone NH group (hinge region), similar to that between the adenine ring of ATP and other ATP-competitive inhibitors.[7] Additionally, the NO2 group of PIK-75 interacts with the charged e-amino group of the catalytic residue Lys85 and its 2-methyl-5-nitrophenyl group interacts via p-stacking with the Phe 67 ring within the glycine-rich loop (Figure 1 B). Most notable, however, is the unusual U-shaped conformation of PIK-75 within the active site that was not predicted by any of the modeling studies performed previously to predict the binding mode of the inhibitor within the active site of kinases.[2c,4] This unusual conformation of PIK-75 makes it possible for the CÀBr bond to point toward the NO2 group plane with an CÀBr···N angle of % 1208 and a 3.3 Š distance between the bromine and nitrogen atom. This value is less than the sum of their van der Waals radii (Supplementary Figure 1), indicating a previously unknown type of intramolecular halogen bond between the Br and the NO2 group. In intermolecular CÀX···NO2 (X = F, Cl, Br, I) interactions previously reported in the literature, the halogen atom is in close proximity to one or both of the oxygen atoms to give a three-centered bifurcated system. The type of interaction can be further characterized as symmetric bifurcated, asymmetric bifurcated, or monocoordinated.[8] The intramolecular CÀBr···NO2 interaction observed in the current study does not fit any of these categories because of restricted possible relative orientations of the Br and the NO2 due to the connecting molecular scaffold. Recently, Zhang and co-workers emphasized the importance of intramolecular halogen bonds and their role in stabilizing a particular conformation of a molecule.[9] We thus speculated that PIK-75 might adopt the same U-shaped conformation in solution as that observed within the active site of GSK-3b. To test this hypothesis, we first determined the crystal structure of PIK-75 by X-ray powder diffraction.[10] PIK-75 crystallized in a monoclinic (P21/n) crystal system with Z = 4 and Z’ = 1 (data statistics are shown in Supplementary Table 3) and indeed adopted a similar U-shaped conformation (Figure 2 and Supplementary Figure 1), indicating that this conformation is also preferred in the absence of the enzyme. It should be noted, however, that we observed a difference of % 108 between the angle of the CÀBr···NO2 in the GSK-3b bound and unbound structures (Supplementary Figure 2 and Supplementary Table 4). The prearrangement of the inhibitor was additionally validated by two-dimensional NMR experiments (Figure 2 and Supplementary Figure 3), and the observed nuclear correlation between the benzyl moiety and the pyridine further validates the presence of the U-shaped conformer. In order to gain a better understanding of the factors that contribute to this conformation of PIK-75, we carried out ab initio post-Hartree–Fock calculations to evaluate existing intramolecular interactions. In addition, we also used the corresponding des nitro analogue to determine the importance of the nitro substituent on the interaction with the bromine atom. The geometries of both compounds were fully optimized in the gas phase according to second-order Møller– Angew. Chem. Int. Ed. 2018, 57, 9970 –9975  2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org 9971 Communications Angewandte Chemie Figure 2. Small-molecule crystal structure and 2D NMR analysis of PIK-75. (A) Relative orientation and distances between Br and the NO2 group in the small-molecule crystal structure of PIK-75 (top) and the superposition of GSK-3b cocrystallized PIK-75 (gray) and the small-molecule crystal structure (purple; bottom). (B) 1H NOESY spectrum of PIK-75 observed in [D6]DMSO, which elucidates resonances from nuclei that are spatially close, rather than through direct bond connection. (C) NOESY cross-correlations for PIK-75 (green). The observed spatial interaction A indicates an intramolecular correlation between the benzylic proton at 8.73 ppm and the pyridine proton at 9.26 ppm, indicating a prearranged conformer of PIK-75. Plesset perturbation theory (MP2) using the 6-31 + G* basis set,[11] followed by a single-point energy calculation at the CAM-B3LYP/6-311G(3d) level of theory. To study the intramolecular interaction from the orbital perspective, the analysis of natural bond orbitals (NBO) focused on the second-order perturbative estimation of donor–acceptor (bonding–antibonding) interactions. The NBO analysis resulted in a table with the stabilization energy E(2) value between each donor–acceptor pair (bonding–antibonding orbitals) (see Methods in the Supporting Information). No stabilization energy E(2) between the nitro group and the bromine atom was observed, suggesting that this energy is above 0.5 kcal molÀ1 (the threshold for printing E(2) in the output file). This was further verified by the fragmentation of PIK-75 into two small representative parts, that is, 6-bromoimidazo[1,2-a]pyridine and 2-hydrosulfonyl-1methyl-4-nitrobenzene, and recalculation of the NBO analysis. The result gave E(2) = 0.23 kcal molÀ1 between a lone pair of one of the nitro group oxygen atoms and the CÀBr antibonding orbital (nO!sCÀBr*). This type of interaction is characterized by the formation of a so-called “s-hole”, first defined in the literature as the lowest-electron-density region along a halogen bond CÀX (X = F, Cl, Br, I)[12] that can participate in an attractive interaction with electron-rich system (such as lone pairs and p system). The weak nO!sCÀ Br* interaction has led us to hypothesize that other factors could additionally contribute to the intramolecular CÀ Br···NO2 interaction observed here. The superposition of the imidazopyridine core of PIK-75 and its des nitro analogue revealed a difference and a shift of 2.2 Š of the phenyl ring (Figure 3 A and Supplementary Table 4). The electrostatic potential maps around the CÀBr bond showed the formation of the s-hole in both molecules, and through use of the maximal and minimal electrostatic potential on a surface map (Vs,max and Vs,min), the nature of this noncovalent interaction was evaluated.[13] The Vs,max in the shole region varied by 4.3 kcal molÀ1 between PIK-75 and the des nitro analogue, showing that the addition of the nitro group on PIK-75 changes the charge distribution around the s-hole region (Figure 3 B). The electron density along the CÀ Br bond, as expected, also differs between PIK-75 and the des nitro analogue as indicated by the presence of a more negatively charged region in the surface map of PIK-75 (VS,min = À13.9 kcal molÀ1 vs. VS,min = À11.0 kcal molÀ1 for the des nitro analogue) (Figure 3 C,D). In PIK-75, the region around the nitrogen atom of the nitro group shows a VS,max = 19.4 kcal molÀ1 and it points toward the axis of the CÀX bond (Figure 3 C). The electrostatic potential surface map shows that the nitrogen atom could act as an electrophile interacting with the extension of CÀBr bond, which has negative potential in comparison to the s-hole, thus forming a classical dipole–dipole interaction. These findings are further supported by QTAIM analysis of the electron density, as the QTAIM graph features a bond critical path connecting bromine with the nitro function (Supplementary Figure 6) After identifying that the intramolecular halogen bond in PIK-75 was also driven by classical electrostatic forces, we implemented molecular dynamics (MD) simulations for 1 ms in aqueous solution to investigate the stability of the proposed interaction of PIK-75. The statistical analysis of the distribution of torsional angles throughout the simulation revealed only two conformational populations, with a distribution of 46 % and 54 % for the minor and major populations, respectively. The main difference between these two ensembles is the rotation of the dihedral angle f3 (H3C-N-N=CH), with average angles of À108 (major) and 908 (minor) (Fig- 9972 www.angewandte.org  2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2018, 57, 9970 –9975 Communications Angewandte Chemie Figure 3. Electronic properties of PIK-75 and the des nitro analogue. (A) Overlay of the ground-state conformations of PIK-75 (gray) and the des nitro analogue (cyan). (B) Electrostatic potential maps for PIK-75 (top) and the des nitro analogue (bottom), highlighting the shape of the s-hole and the maximal energy values of the surface for that particular region (VS,max). (C) Cross-section of the electrostatic potential surface of PIK75, highlighting the positive region of the nitro group and the influence on the s-hole region of the bromine atom, described in terms of VS,max. (D) Cross-section of the electrostatic potential surface of the des nitro analogue, highlighting the stronger s-hole region of the bromine atom, described in terms of VS,max. Surfaces were calculated in the MP2/6-31 + G* level of theory (IsoValue = 0.002). ure 4 A,B). The RMSD values calculated between each frame of the trajectory against the cocrystallized conformation of PIK-75 in complex with GSK-3b and the small-molecule crystal structure (Figure 4 C) suggest that conformations similar to the GSK-3b cocrystallized conformation exist in solution, as indicated by the high frequency of conformations with an RMSD 1.5 Š. Here, the minor conformational ensemble of PIK-75 is more closely related to the cocrystallized conformation than the major ensemble (Supplementary Figure 4). The Br···NO2 interaction was also investigated during the molecular dynamics simulation. Our results revealed an average Br···NO2 distance of % 4 Š and a C-Br···N angle of % 1058 for the minor population, while the major population has an average Br–NO2 distance of % 6 Š and a C-Br···N angle of % 858. These results underline the similarity between the minor population and the cocrystallized conformation and suggest that the proper torsions, angles, and distances required for the Br···NO2 interaction are not only possible, but frequently adopted in aqueous solution. Halogen bonding interactions have recently attracted considerable interest for the development of molecules with enhanced biological activity.[14] The International Union of Pure and Applied Chemistry (IUPAC) defines a halogen bond as an attractive interaction between an electrophilic region associated with a halogen atom and a nucleophilic region in another, or the same, molecular entity.[15] Compounds capable of forming halogen bonds can facilitate the formation of short contacts between the carbonyl and aromatic moieties of backbone amino acid residues, leading to pronounced changes in the selectivity,[14a] as well as the conformation of the kinase.[14b] In the present study, we report Angew. Chem. Int. Ed. 2018, 57, 9970 –9975  2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org 9973 Communications Angewandte Chemie cally active conforma- tion even in the absence of the target enzyme.[16] To date, studies of halogen bonds in small molecule crystals have mainly focused on analysis of intermolec- ular interactions between donor- acceptor com- pounds[17] leading to the classical interpre- tation of the direction- ality of the halogen bond. However, it is not surprising that the situation is different in intramolecular inter- actions involving halo- gen bonds (some examples of intramo- lecular CÀX···NO2 interactions observed in molecules retrieved from the Cambridge Structural Database CSD are shown in Sup- plementary Figure 5). In this study, we observed the mixed nature of the PIK-75 CÀX···NO2 intramolecular halogen bond interaction, with con- tributions from the classical s-hole inter- action (although weak) in addition to a dipole–dipole inter- action between the Figure 4. Molecular dynamics simulations of the conformational space adopted by PIK-75 in solution. (A) Dihedral composition of PIK-75 and relative abundance throughout molecular dynamics (MD) simulation reveals a bimodal distribution of f3 (green). (B) The two most prevalent conformational populations with f3 = À108 (top) and f3 = 90 (bottom). (C) RMSD calculations of PIK-75 structural ensemble during MD simulations compared to the nitro group nitrogen atom and the elongation of the CÀBr bond. Although the IUPAC conformation of PIK-75 observed in complex with GSK-3b (gray) and in the small-molecule structure of PIK-75 (purple). The similarity, especially to the conformation observed in the cocrystal structure with GSK-3b, reinforces the idea that PIK-75 adopts a similar conformation also in solution. definition of a halogen bond only accounts for interactions with the s- hole region,[15] Politzer and co-workers a new crystal structure of GSK-3b bound to the inhibitor PIK- explain the nature of the halogen bond as Coulombic 75 that is stabilized by an unusual intramolecular halogen interactions that involve not only a direct interaction through bond in a binding competent conformation. In subsequent the s-hole, but also an interaction with the electron-dense experiments, we confirmed that this conformation is however regions along the CÀX bond.[18] Thus, our finding that the not only adopted within the active site of GSK-3b, but also intramolecular halogen bond of PIK-75 is not driven exclu- present in solution, in agreement with previous studies sively by an interaction with the s-hole region, but with the showing that small molecules frequently adopt the biologi- extension of the entire CÀBr bond is in agreement with their work. 9974 www.angewandte.org  2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Angew. Chem. Int. Ed. 2018, 57, 9970 –9975 Communications Angewandte Chemie To our knowledge, this study on PIK-75 is the first that shows the importance of an intramolecular halogen bond to stabilize the molecule in a binding-competent conformation and reduce the entropic penalty upon binding, thus for the first time explaining the high potency of this molecule. Acknowledgements This work was co-funded by Fundażo de Apoio à pesquisa do Rio de Janeiro (FAPERJ) (Grant No. E-26/202.918/2015, E-26/200.037/2014), Fundażo de Amparo à Pesquisa do Estado de S¼o Paulo (FAPESP) (Grant No. 2015/26233-7), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (Grant Nos. 304872/2013-0, 307664/2015-5, 302861/2014-9, 402289/2013-7, and 311291/2015-5), Instituto Nacional de Fµrmacos e Medicamentos (INCT-INOFAR) (Grant No. 465.249/2014), Coordenażo de AperfeiÅoamento de Pessoal de Nível Superior (CAPES) and Fundażo de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS), the German Federal Ministry for Education and Research (NGFNPlus and e:Med) (Grant Nos. BMBF 01GS08104, 01ZX1303C), and the Deutsche Forschungsgemeinschaft (DFG). D.R. thanks the German state of North Rhine-Westphalia (NRW) and the European Union (European Regional Development Fund: Investing In Your Future) (EFRE-800400). Conflict of interest The authors declare no conflict of interest. Keywords: glycogen synthase kinase-3b · halogen bonds · molecular dynamics · PIK-75 · structure elucidation How to cite: Angew. Chem. Int. Ed. 2018, 57, 9970 – 9975 Angew. Chem. 2018, 130, 10120 – 10126 [1] P. Cohen, D. R. Alessi, ACS Chem. Biol. 2013, 8, 96 – 104. [2] a) C. Grütter, J. R. Simard, S. C. Mayer-Wrangowski, P. H. Schreier, J. PØrez-Martin, A. Richters, M. Getlik, O. Gutbrod, C. A. Braun, M. E. Beck, D. Rauh, ACS Chem. Biol. 2012, 7, 1257 – 1267; b) M. Hayakawa, H. Kaizawa, K. Kawaguchi, N. Ishikawa, T. Koizumi, T. Ohishi, M. Yamano, M. Okada, M. Ohta, S. Tsukamoto, F. I. Raynaud, M. D. Waterfield, P. Parker, P. Workman, Bioorg. Med. Chem. 2007, 15, 403 – 412; c) Z. H. Zheng, S. I. Amran, P. E. Thompson, I. G. Jennings, Mol. Pharmacol. 2011, 80, 657 – 664. [3] L. C. Cantley, Science 2002, 296, 1655 – 1657. [4] a) R. FrØdØrick, W. A. Denny, J. Chem. Inf. Model. 2008, 48, 629 – 638; b) M. Han, J. Z. H. Zhang, J. Chem. Inf. Model. 2010, 50, 136 – 145; c) Y. Li, Y. Wang, F. Zhang, J Mol Model 2010, 16, 1449 – 1460; d) D. A. Sabbah, J. L. Vennerstrom, H. Z. Zhong, J. Chem. Inf. Model. 2010, 50, 1887 – 1898. [5] E. Beurel, S. F. Grieco, R. S. Jope, Pharmacol. Ther. 2015, 148, 114 – 131. [6] P. Wolle, M. P. Müller, D. Rauh, ACS Chem. Biol. 2018, 13, 496 – 499. [7] J. A. Bertrand, S. Thieffine, A. Vulpetti, C. Cristiani, B. Valsasina, S. Knapp, H. M. Kalisz, M. Flocco, J. Mol. Biol. 2003, 333, 393 – 407. [8] a) F. H. Allen, J. P. M. Lommerse, V. J. Hoy, J. A. K. Howard, G. R. Desiraju, Acta Crystallogr. Sect. B 1997, 53, 1006 – 1016; b) C. V. Ramana, Y. Goriya, K. A. Durugkar, S. Chatterjee, S. Krishnaswamy, R. G. Gonnade, CrystEngComm 2013, 15, 5283 – 5300. [9] Y. C. Zhang, Y. X. Lu, Z. J. Xu, H. R. Ding, W. H. Wu, H. L. Liu, Struct. Chem. 2016, 27, 907 – 917. [10] a) F. N. Costa, T. F. da Silva, E. M. B. Silva, R. C. R. Barroso, D. Braz, E. J. Barreiro, L. M. Lima, F. Punzo, F. F. Ferreira, RSC Adv. 2015, 5, 39889 – 39898; b) F. N. Costa, F. F. Ferreira, T. F. da Silva, E. J. Barreiro, L. M. Lima, D. Braz, R. C. Barroso, Powder Diffr. 2013, 28, S491 – S509. [11] M. Head-Gordon, J. A. Pople, M. J. Frisch, Chem. Phys. Lett. 1988, 153, 503 – 506. [12] R. Wilcken, M. O. Zimmermann, A. Lange, A. C. Joerger, F. M. Boeckler, J. Med. Chem. 2013, 56, 1363 – 1388. [13] J. S. Murray, P. Lane, P. Politzer, J. Mol. Model. 2009, 15, 723 – 729. [14] a) O. Fedorov, K. Huber, A. Eisenreich, P. Filippakopoulos, O. King, A. N. Bullock, D. Szklarczyk, L. J. Jensen, D. Fabbro, J. Trappe, U. Rauch, F. Bracher, S. Knapp, Chem. Biol. 2011, 18, 67 – 76; b) J. Poznan´ ski, M. Winiewska, H. Czapinska, A. Poznan´ ska, D. Shugar, Acta Biochim. Pol. 2016, 63, 203 – 214. [15] G. Desiraju, P. S. Ho, L. Kloo, C. Legon Anthony, R. Marquardt, P. Metrangolo, P. Politzer, G. Resnati, K. Rissanen, in Pure and Applied Chemistry, Vol. 85, 2013, p. 1711. [16] C. R. Groom, J. C. Cole, Acta Crystallogr. Sect. D 2017, 73, 240 – 245. [17] a) I. Alkorta, I. Rozas, J. Elguero, J Phys Chem. A 1998, 102, 9278 – 9285; b) S. V. Rosokha, C. L. Stern, J. T. Ritzert, Chem. Eur. J. 2013, 19, 8774 – 8788; c) G. Cavallo, P. Metrangolo, R. Milani, T. Pilati, A. Priimagi, G. Resnati, G. Terraneo, Chem. Rev. 2016, 116, 2478 – 2601. [18] P. Politzer, J. S. Murray, T. Clark, Top. Curr. Chem. 2015, 358, 19 – 42. Manuscript received: April 27, 2018 Revised manuscript received: June 1, 2018 Accepted manuscript online: June 6, 2018 Version of record online: July 9, 2018 Angew. Chem. Int. Ed. 2018, 57, 9970 –9975  2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org 9975 Supporting Information An Unusual Intramolecular Halogen Bond Guides Conformational Selection Roberta Tesch, Christian Becker, Matthias Philipp Müller, Michael Edmund Beck, Lena Quambusch, Matthäus Getlik, Jonas Lategahn, Niklas Uhlenbrock, Fanny Nascimento Costa, Marcelo D. PolÞto, Pedro de Sena Murteira Pinheiro, Daniel Alencar Rodrigues, Carlos Mauricio R. Sant’Anna, Fabio Furlan Ferreira, Hugo Verli, Carlos Alberto Manssour Fraga,* and Daniel Rauh* anie_201804917_sm_miscellaneous_information.pdf SUPPORTING INFORMATION Abstract: PIK-75 is a phosphoinositide-3-kinase (PI3K) α-isoform-selective inhibitor with high potency. Although published SAR data show the importance of the NO2 and the Br substituents in PIK-75, none of the published studies could correctly assign the underlying reason for their importance. In this publication, we report the first X-ray crystal structure of PIK-75 in complex with the kinase GSK-3β. The structure shows an unusual U-shaped conformation of PIK-75 within the active site of GSK-3β that is likely stabilized by an intramolecular Br···NO2 halogen bond. MD simulations show that this conformation presumably also exists in solution and leads to a binding competent pre-configuration of the PIK75 molecule, thus explaining its high potency. We therefore suggest that the site-specific incorporation of halogen bonds could be generally used to design conformationally restricted bioactive substances with increased potencies. DOI: 10.1002/anie.2016XXXXX 1 SUPPORTING INFORMATION Table of Contents Table of Contents...................................................................................................................................................................... 2 Experimental Procedures......................................................................................................................................................... 3 Construct Design of human glycogen synthase kinase-3β Protein Expression and Purification Crystallization and Structure Determination Crystal Structure Determination using X-ray powder diffraction of PIK-75 Molecular Dynamics of PIK-75 in water solvent Ab Initio calculations of PIK-75 and the non-nitrated analogous QTAIM Analysis of PIK-75 electron density Synthesis of PIK-75 Results and Discussion............................................................................................................................................................ 5 Figure S1. Relative orientation and distances between the Br and the NO2-group in PIK-75 co-crystallized with GSK-3β. Figure S2. Crystal packing of PIK-75. Figure S3. Comparison of the two main PIK-75 conformations observed in the MD simulation and the conformation of PIK-75 observed in complex with GSK-3β. Figure S4. PIK-75 NMR Spectra in DMSO-d6. Figure S5. Small molecule crystal structures retrieved from the Cambridge Structural Database (CSD). Figure S6. QTAIM graph of PIK-75. Table S1. Overview of the inhibitory potency of PIK-75 on different kinases. Table S2. Data collection and refinement statistics for GSK-3β in complex with PIK-75. Table S3. Data collection and refinement statistics for PIK-75. Table S4. Geometric parameters of PIK-75 and the des-nitro analog obtained via different methods. Scheme S1. Synthesis of PIK-75. References............................................................................................................................................................................... 18 Author Contributions.............................................................................................................................................................. 18 2 SUPPORTING INFORMATION Experimental Procedures Construct Design of human glycogen synthase kinase-3β For the crystallization construct, the DNA encoding residues comprising the kinase domain of human GSK-3β (uniprot entry P49841, residues 26-393) were synthesized by GeneArt (life technologies) including an N-terminal polyhistidine tag and recognition site for tobacco etch virus protease (MGHHHHHHGENLYFQG). The construct was cloned into pIEx/Bac3 expression vector (Merck Millipore), using NcoI and Bsu36I restriction sites, for usage in the BacMagic expression system (Merck Millipore). Transfection, virus generation, virus amplification and preparative scale expression was carried out in Spodoptera frugiperda cell line Sf9 following the BacMagic protocol. Protein Expression and Purification GSK-3β was expressed in Sf9 cells using the BacMagic system. Following protein expression the cells were harvested (3000 x g, 10 min), resuspended in buffer A (50 mM TRIS, 500 mM NaCl, 10 % glycerol, 1 mM DTT, pH 8, including Roche protease inhibitor cocktail) and homogenized in a french press. The lysate was cleared by centrifugation at 40.000 x g for 1 h at 8°C and loaded on a column packed with Ni-NTA Superflow resin (Qiagen). The elution was performed with a gradient of buffer B (buffer A + 500 mM imidazole) from 0-250 mM imidazole over 30 min. TEV cleavage was carried out by dialysis against buffer C (20 mM TRIS, 10 mM NaCl, 10 % glycerol, 1 mM EDTA, 1 mM DTT, pH 8) overnight at 8 °C. Following the cleavage reaction, the protein solution was loaded on a HiTrap-Q-HP column (GE Healthcare). GSK-3β was eluted with buffer D (buffer C + 1 M NaCl) with a gradient of 0250 mM NaCl over 30 min. For the final purification step, fractions containing GSK-3β were concentrated and applied to a HiLoad 16/60 superdex 75 pg column (GE Healthcare) in buffer E (25 mM TRIS, 250 mM NaCl, 10 % Glycerol, pH 8). Fractions with purified GSK-3β were concentrated to 5 mg·mL-1 and stored at -80 °C until further use. Protein identity was confirmed by ESI-MS (electrospray ionization mass spectrometry) analysis. Crystallization and Structure Determination The inhibitor PIK-75 was co-crystallized with GSK-3β by incubating 5 mg/mL protein with a 2-fold molar excess of inhibitor (10 mM stock in DMSO) for 1 h at 4°C to allow enzyme-inhibitor complex formation prior to crystallization. Crystals grew using the hanging drop vapor diffusion method at 20°C after mixing 1 μL protein-inhibitor solution with 1 μL reservoir solution (0,1 M HEPES pH 7.0, 22% v/v PEG 8000, 8% ethylenglycol). The data set was collected at the PXII X10SA beamline of the Swiss Light Source (PSI, Villingen, Switzerland) and the data set was indexed and integrated with XDS and scaled using XSCALE[1]. The structure of PIK75/GSK-3β was solved by molecular replacement with PHASER[2] using pdb entry 4J1R as template. The GSK-3β molecules in the asymmetric unit were manually adjusted using the program COOT[3]. The refinement was performed with phenix.refine[4]. Inhibitor topology files were generated using the Dundee PRODRG2 server[5]. The refined structure was validated with the PDB validation server. Data collection, structure refinement statistics and the PDB-ID code are provided in Table S1. PyMOL[6] was used for generating figures. Crystal Structure Determination using X-ray powder diffraction of PIK-75 To determine the crystal structure of PIK-75, X-ray powder diffraction data were collected using a STADI-P powder diffractometer (Stoe®, Darmstadt, Germany) in transmission geometry by using monochromatic radiation (Cu Kα1 = 1.54056 Å) selected by a curved Ge (111) crystal. A silicon microstrip detector, Mythen 1K (Dectris®, Baden, Switzerland), was used to register the collected intensities in the range from 3 to 80°, with step sizes of 0.015° and 500 s of integration time at each 1.05°. The sample – prepared as a fine powder – was conditioned between two acetate-cellulose foils and the sample holder was kept spinning during the measurement. In summary, the crystal structure determination procedure is divided in the following steps: X-ray diffraction pattern indexing, symmetry choice, structure solution and refinement. During the indexing step, the first 20 reflections of the pattern were fitted using TOPAS-Academic v.5 to obtain the unit cell parameters[7]. Combining this information with the systematic absences, the space group was found. In order to confirm if the choice of space group as well as unit cell parameters were correct, a Pawley refinement was carried out (Rwp = 3.352 %, Rexp = 0.866 and χ2 = 3.870). The values found via TOPAS-Academic were used in conjunction with the chemical structure in the process of crystal structure determination by means of a simulated annealing (SA) algorithm implemented in the DASH software[8]. Around 15 runs of SA process were globally optimized and the best result was considered in the final structure refinement using the Rietveld method, using the TOPAS-Academic v.5 software program, providing a satisfactory fit and good final R-factors and quality of fit indicators (Rwp = 3.980 %, Rexp = 1.268 and χ2 = 3.140). The final solution and the crystal structure parameters of PIK-75 are presented in Supplementary Figure 1 and Supplementary Table 2. More detailed information about the complete procedure can be found in previous reports[9]. Molecular Dynamics of PIK-75 in water solvent Bonded and non-bonded parameters for PIK-75 were generated using GROMOS philosophy[10]. Electrostatic potential (ESP) derived atomic partial charges were generated in Gaussian09 with the second-order Møller−Plesset perturbation theory (MP2) using 6-31G* basis set[11]. Small adjustments were carried out when translating atomic partial charges from QM to MM topology in order to maintain the overall direction and magnitude of the dipole moment. 3 SUPPORTING INFORMATION The torsional profiles were calculated for 4 new dihedrals using HF/6-31G*[12] in Gaussian09 and submitted to RotProf server[13] (http://dqfnet.ufpe.br/biomat/rotprof/) to obtain MM parameters to yield the same torsional profiles in molecular dynamics calculations. Bond, angle and Lennard-Jones parameters were taken from similar atom types from GROMOS53a6 force field[14]. The conformational ensemble of PIK-75 was simulated in water with GROMACS 5.0.7[15] package where the ligand was placed in the center of a cubic simulation box with 125 nm³. The box was filled with SPC/E water[16] and minimized to avoid steric clashes. After minimization, the box was equilibrated in NVT ensemble to normalize the thermal coupling at 298 K using Nosé-Hoover algorithm[17]. The NPT ensemble was carried out to normalize the pressure coupling at 1 bar using Parrinello-Rahman algorithm[18]. A production run was carried for 1 μs with a timestep of 2 fs, using LINCS constraint algorithm[19] and Particle Mesh Ewald (PME) method[20] with a buffer of 1.0 nm for both Coulomb and Lennard-Jones treatments and a fourier spacing grid of 0.12 nm. Ab Initio calculations of PIK-75 and the non-nitrated analogous Ab initio calculations was performed in Spartan 16‘ software (Wavefunction Inc.) using the second-order Møller−Plesset perturbation theory (MP2) and 6-31+G* basis set[21]. The ground state conformation and their respective electrostatic potential surface were computed for both PIK-75 and its non-nitrated analogous. Single point calculations and Natural bond order (NBO)[22] analysis were performed in Gaussian09 revision D.01[23] using CAM-B3LYP/6-311G(3d) level of theory[24]. The stabilization energy E(2) given by the NBO analysis is calculated via the formula, E(2) = ΔEij = qi F(i,j)2 εj – εi in which, accordingly to the manual, qi is the donor orbital occupancy, εj, εi are diagonal elements (orbital energies) and F(i,j) is the off-diagonal NBO Fock matrix element and entries are included in the table only when the interaction energy exceeds the default threshold of 0.5 kcal/mol. QTAIM Analysis of PIK-75 electron density The wavefunction of PIK-75 at MP2 // 6-31+G* level of theory was subjected to QTAIM (Quantum theory of atoms in molecules[25] analysis, using AIMStudio [AIMAll (Version 17.11.14), Todd A. Keith, TK Gristmill Software, Overland Park KS, USA, 2017 (aim.tkgristmill.com)] Version 17.11.14 and applying default parameters for calculation as well as visualisation (see Figure S5). Synthesis of PIK-75 All reagents and solvents were purchased from Acros, Alfa Aesar, Merck, Sigma-Aldrich or VWR and used without further purification. 1H and 13C spectra were recorded on Bruker DRX 400 (400 MHz/101 MHz) and DRX 600 (600 MHz/150 MHz). 1H chemical shifts are reported in δ (ppm) as s (singlet), d (doublet), dd (doublet of doublet), t (triplet), q (quartet), m (multiplet) and b (broad singlet) and are referenced to the residual solvent signal of DMSO-d6 (2.50). 13C spectra are referenced to residual solvent signal of DMSO-d6 (39.52). High-resolution electrospray ionization mass spectra (ESI-FTMS) were recorded on a Thermo LTQ Orbitrap (high-resolution mass spectrometer from Thermo Electron) coupled to an Accela HPLC system supplied with a Hypersil GOLD column (Thermo Electron). Preparative HPLC was conducted on an Agilent HPLC system (1200 series) with a VP 125/21 Nucleodur C18 column from Macherey-Nagel and monitored by UV at λ = 210 nm and 254 nm. 4 SUPPORTING INFORMATION Results and Discussion Figure S1. Relative orientation and distances between the Br and the NO2-group in PIK-75 co-crystallized with GSK-3β. 5 SUPPORTING INFORMATION Figure S2. Crystal packing of PIK-75 showing four formula units per unit cell (Z = 4, Z’ = 1) (carbon atoms in gray) and the intermolecular halogen bond between two PIK-75 molecules from different asymmetric units (carbon atoms in green). The distance between the two Br atoms – represented by the blue dashed-lines – is of 3.5 Å. 6 SUPPORTING INFORMATION Figure S3. PIK-75 NMR Spectra in DMSO-d6. 1H-NMR 13C-NMR 7 SUPPORTING INFORMATION NOESY HMBC HMBC spectra (observed in DMSO-d6) and chemical structure of the inhibitor. The assigned HMBC correlations (red arrows) lead to the exact molecular structure. Corresponding chemical shifts of the protons are shown in blue. 8 2 36 9 1 10 5 11 48 7 8 SUPPORTING INFORMATION Figure S4. Comparison of the two main PIK-75 conformations observed in the MD simulation to the conformation of PIK-75 observed in complex with GSK-3β. Distribution size is proportional to the population occupancy throughout simulation, indicating that the minor population bears stronger resemblance to the conformation of PIK-75 in complex with GSK-3β dus to the lowest RMSD. 9 SUPPORTING INFORMATION Figure S5. Small molecule crystal structures retrieved from the Cambridge Structural Database (CSD). The CSD code is shown above each compound and the angle C-X---NO2 is shown between C-X and the nearest atom of nitro group, highlighting the different directionalities observed for C-X---NO2 intramolecular interactions. 10 SUPPORTING INFORMATION Figure S6. QTAIM graph of PIK-75. Nuclear critical points, or nuclei, respectively, are shown by large spheres. Small green spheres show positions of bond critical points. Bond critical paths shown as tubular connections. Broken tubular lines are used to indicate paths associated to BCP charges below 0.005 atomic units. The contour map shows the Laplacian of the electron density, mapped to the plane spun by Br, one NO2-oxygen and the phenyl-carbon the NO2 moiety is attached to. Contour values: -1 contours (red, dashed), 0 (green), +1 (blue), values in atomic units. 11 SUPPORTING INFORMATION Table S1. Overview of the inhibitory potency of PIK-75 on different kinases ([PIK-75] = 1000 nM, Measurements were performed in duplicates and at the apparent ATP KM of each individual kinase employing the Thermo Fisher Scientific SelectScreen™ Profiling Service - Hot Spot). Kinase ABL1 ACVR1B (ALK4) AKT1 (PKB alpha) AKT2 (PKB beta) AKT3 (PKB gamma) AMPK (A1/B2/G2) AURKA (Aurora A) BLK BRSK1 (SAD1) CAMK2A (CaMKII alpha) CDK1/cyclin B CDK17/cyclin Y CDK18/cyclin Y CDK2/cyclin A CDK5/p25 CDK5/p35 CDKL5 CHEK1 (CHK1) CHEK2 (CHK2) CLK1 CLK2 CLK3 CSK CSNK1A1 (CK1 alpha 1) CSNK2A1 (CK2 alpha 1) DNA-PK DYRK1A DYRK1B DYRK3 DYRK4 EGFR (ErbB1) EPHA2 EPHA4 EPHB4 ERBB2 (HER2) FGFR1 FGFR2 FGFR3 FLT1 (VEGFR1) FLT3 FRAP1 (mTOR) GSK3A (GSK3 alpha) GSK3B (GSK3 beta) IGF1R INSR IRAK4 % inhibition 21,2 -2,5 29,4 10,1 43,4 71,9 46,0 59,6 70,7 64,4 97,8 89,2 98,8 100,0 104,4 106,1 43,5 12,2 26,2 103,3 105,7 96,6 11,2 32,5 18,9 99,7 100,8 98,0 96,9 15,5 44,5 36,5 38,0 21,0 -0,5 68,0 41,2 14,4 17,3 102,2 43,3 95,7 102,0 29,6 10,6 84,7 standard deviation 4,9 2,9 3,1 0,6 0,5 3,7 2,9 1,6 0,5 6,1 1,1 2,3 0,0 0,8 0,9 0,2 4,1 4,1 7,3 0,5 2,2 0,4 3,0 3,2 5,5 0,1 0,8 1,8 0,1 0,9 2,9 0,3 4,6 4,5 0,5 2,3 0,8 0,7 1,7 0,1 1,0 0,5 0,9 3,9 0,1 3,6 12 SUPPORTING INFORMATION JAK3 KDR (VEGFR2) KSR2 LCK MAP3K19 (YSK4) MAP3K9 (MLK1) MAP4K2 (GCK) MAP4K4 (HGK) MAP4K5 (KHS1) MAPK1 (ERK2) MAPK11 (p38 beta) MAPK12 (p38 gamma) MAPK13 (p38 delta) MAPK14 (p38 alpha) Direct MAPK3 (ERK1) MAPK7 (ERK5) MAPKAPK2 MAPKAPK3 MAPKAPK5 (PRAK) MARK1 (MARK) MET (cMet) MKNK1 (MNK1) MST4 MYLK2 (skMLCK) NEK2 NTRK1 (TRKA) PAK2 (PAK65) PAK4 PDK1 Direct PIM2 PLK1 PRKACA (PKA) PRKCA (PKC alpha) PRKCB1 (PKC beta I) ROCK1 ROCK2 RPS6KA5 (MSK1) SGK (SGK1) SRC SRPK1 SRPK2 TAOK2 (TAO1) TEK (Tie2) 93,9 77,0 0,9 82,8 99,6 94,0 98,8 101,9 77,7 13,4 2,4 83,3 65,6 10,7 19,6 4,3 23,2 12,1 18,3 89,1 87,5 18,0 -15,5 25,9 40,5 90,8 32,8 53,1 50,4 47,4 27,3 95,1 87,1 55,3 89,6 59,0 93,7 41,8 46,3 20,9 22,8 23,5 -3,2 1,0 2,3 0,0 0,1 0,9 0,7 0,3 0,5 3,6 0,8 0,7 3,0 6,1 4,3 2,0 0,1 3,0 2,6 0,8 0,1 2,6 4,4 2,1 0,8 0,1 0,6 4,6 2,1 2,3 6,9 4,5 2,3 3,7 5,1 2,3 5,6 2,4 0,7 2,0 0,3 0,5 0,1 0,5 13 SUPPORTING INFORMATION Table S2. Data collection and refinement statistics for GSK-3β in complex with PIK-75. GSK-3β with PIK-75 PDB-entry: 6GN1 Data collectiona,b Space group wavelength (Å) P21 0.91883 Cell dimensions a, b, c (Å) α, β, γ () 67.63, 119.49, 67.48 90.00, 102.49, 90.00 Resolution (Å) 48.21-2.60 No. of unique reflections 32218 (3475) Redundancy I / σI 6.8 (7.0) 11.99 (2.2) Completeness (%) 99.9 (100.0) Rmeas (%) Refinement 13.1 (86.2) Resolution (Å) 44.30 - 2.60 No. reflections 32189 Rwork / Rfree No. atoms 22.2 / 25.9 Protein 5450 Ligand/ion 54 Water 109 B-factors Protein 58.9 Ligand/ion 52.8 Water 48.2 R.m.s. deviations Bond lengths (Å) 0.008 Bond angles () 1.080 Ramachandran Plot Outliers (%) 0 Allowed (%) 2.5 Favored (%) PDB ID 97.5 6GN1 [a] Data collection statistics refer to merged Friedel pairs. [b] Diffraction data from a single crystal was used to determine the complex structure. 14 SUPPORTING INFORMATION Table S3. Data collection and refinement statistics for PIK-75. Crystal Structure Space group a, b, c (Å) β () Volume (Å3) Z, Z’ ρcalc (g cm-3) μ (mm-1) T (K) Data collection Diffractometer Monochromator Wavelength (Å) 2 range (°) Step size (°) Time per step (s) Refinement Number of data points Number of contributing reflections Number of restraints Number of refined parameters Rp (%) Rexp (%) Rwp (%) RBragg (%) χ2 PIK-75 CCDC-ID: 1848022 P21/n 23.0247(2), 7.38618(9),10.59344(12) 100.1890(7) 1773.16(3) 4, 1 1.69426(3) 4.579 298 STADI P Ge(111) 1.54056 3-80 1.05 500 4200 1115 72 287 3.594 1.268 3.975 2.312 3.140 [a] Powder diffraction data were used to determine the crystal structure. 15 SUPPORTING INFORMATION Table S4. Geometric parameters of PIK-75 and the des-nitro analog obtained via different methods. PIK-75a Des-nitro PIK-75a PIK-75 (MD)b small molecule structure of PIK75 Distances (Å) C1'-Br 3.5 Br-NO2 3.3 Angle (°) C-Br-NO2 119 Dihedral (°) C1-C2-S3-O4 179.7 C2-S3-N5-N7 57.0 C6-N5-N7-C8 18.1 N7-C8-C9N10 6.1 3.9 - - 179 60.0 28.3 14.4 5.3 4.6 105.0 151.7 90.0 90.0 0.0 3.6 3.4 110.3 179.5 50.8 0.2 7.8 [a] Optimized structure from quantum calculation MP2/6-31+G* in gas-phase. [b] Representative conformation of the minority population (46%) generated by Molecular Dynamics simulation of 1µs. 16 SUPPORTING INFORMATION Scheme S1. Synthesis of PIK-75. aReagents and conditions: i) POCl3, DMF, 110-90 °C, 5 h, 19 %, ii) methylhydrazine, EtOH, reflux, 3 h, iii) 2-methyl-5-nitrobenzenesulfonyl chloride, pyridine, rt, 4 h, 10 % (over 2 steps). Synthesis of 6-Bromoimidazo[1,2-a]pyridin-3-carbaldehyde (2). To a solution of 6-bromoimidazo[1,2-a]pyridine (1, 1 g, 5.1 mmol) in DMF (1.5 mL), a solution of phosphoryl chloride (813 µL, 8.9 mmol) in DMF (5.0 mL) was added dropwise. After addition of DMF (5 mL), the reaction mixture was allowed to stir for 1 h at 110 °C, then for additional 4 h at 90 °C. The suspension was cooled down to room temperature and neutralized with NaOH (5 M). The crude product was extracted with DCM (6 x 10 mL), the combined organic fractions were dried over sodium sulfate and the solvent was evaporated. Silica column chromatography (90 % EtOAc/PE) yielded the title product as a white solid (214 mg, 0.95 mmol, 19 %). 1H NMR (400 MHz, DMSO-d6): δ 9.95 (s, 1H), 9.49 (s, 1H), 8.54 (s, 1H), 7.887.82 (m, 2H); 13C NMR (100 MHz, DMSO-d6) δ 179.38, 146.99, 146.90, 133.19, 127.72, 124.64, 118.64, 109.59. HRMS (ESI) (m/z): Calcd.: 224.96580 for C8H5BrN2O [M+H]+, found: 224.96597. Synthesis of PIK-75 (3). 6-Bromoimidazo[1,2-a]pyridin-3-carbaldehyde (2, 100 mg, 0.4 mmol) and methylhydrazine (23 µL, 0.4 mmol) were dissolved in EtOH and allowed to stir for 3 h under reflux conditions. The solvent was removed in vacuo and the residue was dissolved in pyridine (3 mL). 2-methyl-5-nitrobenzenesulfonyl chloride (104 mg, 0.4 mmol) was added and the reaction mixture was allowed to stir for 4 h at room temperature. The crude product was washed with water and extracted with dichloromethane (3 x 30 mL). The combined organic fractions were dried over sodium sulfate and the solvent was evaporated. The product was obtained after preparative HPLC (10-100 % MeCN/H2O + 0.1 % TFA) as a white solid (18 mg, 0.04 mmol, 10 %). 1H NMR (600 MHz, DMSO-d6): δ 9.10 (d, J = 1.4 Hz, 1H), 8.74 (d, J = 2.5 Hz, 1H), 8.46 (dd, J = 2.5 Hz, 8.4 Hz, 1H), 8.31 (s, 1H), 8.06 (s, 1H), 7.79 (d, J = 8.5 Hz, 1H), 7.75 (d, J = 9.5 Hz, 1H), 7.62 (d, J = 9.6 Hz, 1H), 3.46 (s, 3H), 2.69 (s, 3H); 13C NMR (150 MHz, DMSO-d6) δ 145.50, 145.50, 144.70, 138.47, 136.74, 134.82, 134.58, 130.36, 128.16, 127.10, 124.80, 120.01, 118.06, 108.26, 31.53, 20.13. HRMS (ESI) (m/z): Calcd.: 452.00214 for C16H15BrN5O4S [M+H]+, found: 452.00214. 17 SUPPORTING INFORMATION References [1] W. Kabsch, Acta Crystallogr D 2010, 66, 125-132. [2] A. J. Mccoy, R. W. Grosse-Kunstleve, P. D. Adams, M. D. Winn, L. C. Storoni, R. J. Read, J Appl Crystallogr 2007, 40, 658-674. [3] P. Emsley, B. Lohkamp, W. G. Scott, K. Cowtan, Acta Crystallogr D 2010, 66, 486-501. [4] P. V. Afonine, R. W. Grosse-Kunstleve, N. Echols, J. J. Headd, N. W. Moriarty, M. Mustyakimov, T. C. Terwilliger, A. Urzhumtsev, P. H. Zwart, P. D. Adams, Acta Crystallogr D 2012, 68, 352-367. [5] A. W. Schuttelkopf, D. M. F. van Aalten, Acta Crystallogr D 2004, 60, 1355-1363. [6] W. L. DeLano, J. W. Lam, Abstr Pap Am Chem S 2005, 230, U1371-U1372. [7] A. A. Coelho, J. S. O. Evans, I. R. Evans, A. Kern, S. Parsons, Powder Diffr 2011, 26, S22-S25. [8] W. I. F. David, K. Shankland, J. van de Streek, E. Pidcock, W. D. S. Motherwell, J. C. Cole, J Appl Crystallogr 2006, 39, 910-915. [9] aF. N. Costa, T. F. da Silva, E. M. B. Silva, R. C. R. Barroso, D. Braz, E. J. Barreiro, L. M. Lima, F. Punzo, F. F. Ferreira, Rsc Adv 2015, 5, 39889- 39898; bF. N. Costa, F. F. Ferreira, T. F. da Silva, E. J. Barreiro, L. M. Lima, D. Braz, R. C. Barroso, Powder Diffr 2013, 28, S491-S509; cF. F. Ferreira, S. G. Antoni, P. C. P. Rosa, C. D. Paiva-Santos, J Pharm Sci-Us 2010, 99, 1734-1744; dT. Sato, L. S. Taylor, Crystengcomm 2017, 19, 8087. [10] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, Williams, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery Jr., J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman, D. J. Fox, Wallingford, CT, 2016. [11] M. Headgordon, J. A. Pople, M. J. Frisch, Chem Phys Lett 1988, 153, 503-506. [12] C. F. Fischer, Comput Phys Commun 1987, 43, 355-365. [13] V. H. Rusu, R. Baron, R. D. Lins, J Chem Theory Comput 2014, 10, 5068-5080. [14] C. Oostenbrink, A. Villa, A. E. Mark, W. F. Van Gunsteren, J Comput Chem 2004, 25, 1656-1676. [15] M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. Lindahl, SoftwareX 2015, 1-2, 19-25. [16] H. J. C. Berendsen, J. R. Grigera, T. P. Straatsma, J Phys Chem-Us 1987, 91, 6269-6271. [17] aW. G. Hoover, Phys Rev A 1985, 31, 1695-1697; bS. Nose, Mol Phys 1984, 52, 255-268. [18] aM. Parrinello, A. Rahman, J Appl Phys 1981, 52, 7182-7190; bS. Nose, M. L. Klein, Mol Phys 1983, 50, 1055-1076. [19] aB. Hess, H. Bekker, H. J. C. Berendsen, J. G. E. M. Fraaije, J Comput Chem 1997, 18, 1463-1472; bB. Hess, C. Kutzner, D. van der Spoel, E. Lindahl, J Chem Theory Comput 2008, 4, 435-447. [20] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, L. G. Pedersen, Journal of Chemical Physics 1995, 103, 8577-8593. [21] M. J. Frisch, M. Headgordon, J. A. Pople, Chem Phys Lett 1990, 166, 275-280. [22] F. Weinhold, C. R. Landis, Chemistry Education Research and Practice 2001, 2, 91-104. [23] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, Williams, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery Jr., J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman, D. J. Fox, Wallingford, CT, 2016. [24] T. Yanai, D. P. Tew, N. C. Handy, Chem Phys Lett 2004, 393, 51-57. [25] R. F. W. Bader, Oxford University Press 1990. Author Contributions Organic synthesis (DAR, MG, NU, JL), X-ray crystallography (RT, CB, MPM, DR), powder diffraction (FNC, FFF), MD simulations (MEB, MD, HV), Quantum calculations (PSMP, CMRS, MEB), data analysis (all authors), drafting the manuscript (RT, MPM, DR, CAMF) and project administration (CAMF, DR). 18 Capítulo 5. Resultados 136 5.4 Capítulo IV Ao longo do doutoramento, um profundo conhecimento sobre Campos de Força e suas particularidades foi necessário para o desenvolvimento dos trabalhos que fazem parte dessa tese, bem como a construção em si da mesma. Durante esse processo de aprendizado, foi possível perceber a falta de material de estudo adequado tanto na língua portuguesa quanto à nível de graduação. Nesse sentido, e aproveitando a oportunidade de contribuir para a 2a edição do livro "Bioinformática da Biologia à Flexibilidade Molecular", um capítulo foi escrito especificamente sobre Campos de força. Este capítulo será lançado juntamente com a segunda edição do livro, o qual será disponibilizado online. Esse material ainda será formatado na linguagem HTML. Esse capítulo explica a simplificação necessária na MM, conceitua e descreve a função de potencial e traz um pouco da história científica por trás das principais famílias de campo de força e seus níveis de resolução, enquanto traz as principais informações de cada campo de força. Campos de Força Marcelo Depólo Polêto Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil http://www.ufrgs.br/bioinfo [Sessão avançada em azul ] 1 A escala atômica e suas forças Desde o início do século XIX, a introdução do conceito de teoria quântica forneceu uma nova perspectiva para investigar e compreender o mundo numa escala atômica. Quando saímos da escala do visível e focamos nossa atenção a uma escala nanométrica (10-9 metros), as leis da mecânica clássica da forma como as vivenciamos em nosso dia a dia (gravidade, peso, etc) já não conseguem explicar os fenômenos químicos experimentalmente observáveis. Nessas dimensões atômicas tão pequenas, imperam as leis da mecânica quântica (MQ), na qual os núcleos, elétrons e seus spins, orbitais atômicos e moleculares desempenham grande papel nas interações entre átomos e entre moléculas. Em 1926, Schrödinger propôs o primeiro modelo matemático para tratar os elétrons do átomo de hidrogênio sob efeitos quânticos (equação 1), o que deu base à grande parte da química quântica como a conhecemos hoje. 2 − 2µ ∇2Ψ (r, t) + V (r, t)Ψ (r, t) = i ∂Ψ (r, t) ∂t (1) Na qual ψ(r, t) é a função de onda e V (r, t) é a energia potencial, ambas em função da posição r e do tempo t. Ainda, é uma redução da constante de Plank h ( = h/2π), ∇2 é o operador diferencial de Laplace e µ é a massa inercial efetiva. O modelo de Schrödinger, contudo, se torna extremamente complexo ao ser aplicado em sistemas multieletrônicos, para os quais certas simplificações foram propostas ao longo dos anos para que soluções pudessem ser encontradas. Para moléculas, o uso da Aproximação de Born-Oppenheimer assume que o movimento dos elétrons é muito mais rápido que o movimento do núcleo, podendo, portanto, se ajustar adiabaticamente à trajetória nuclear. Matematicamente falando, isso permite que a função de onda calculada possa ser dividida em suas componentes eletrônicas, nucleares e vibracionais, podendo, portanto, serem calculadas separadamente. Os métodos utilizados para resolver a equação de Schrödinger são os chamados métodos quânticos ab initio, que diferem entre si, basicamente, pelos modelos usados na forma de calcular as interações entre os orbitais moleculares do sistema em análise. Desde então, a mecânica quântica cresceu exponencialmente e ainda hoje tem proporcionado à Química e à Física avanços substanciais através de modelos matemáticos capazes de predizer diversos fenômenos posteriormente observados (e alguns ainda a serem comprovados). Tendo isso em mente, é importante frisar que lidar com todos os elétrons de uma determinada molécula tem um custo computacional elevado, o que acaba trazendo limitações quanto ao tamanho dos sistemas analisados. 2 Polêto, M.D. É nesse contexto que a modelagem computacional de biomoléculas como proteínas, carboidratos, lipídeos e ácidos nucléicos se torna um desafio. Devido à grande quantidade de átomos (e, consequentemente, elétrons) dessas biomoléculas, o custo computacional para modelá-las rapidamente excede o viável. Essa complexidade impacta também na acurácia da descrição de certos eventos biológicos, visto que as interações entre os componentes do sistema (solvente, biomoléculas, cofatores metálicos, etc) precisam ser calculadas simultaneamente e com diferentes níveis de detalhamento dos métodos quânticos para serem modelados com acurácia. Ainda, existe uma questão temporal que impede o uso da MQ para modelar biomoléculas de forma efetiva. Em geral, movimentos moleculares como enovelamento, mudança de estrutura secundária e transdução de sinal por reconhecimento molecular acontecem em escalas de tempo que variam entre nanossegundos (10-9 segundos) e microssegundos (10-6 segundos). Em contrapartida, sabe-se que o tempo médio do deslocamento de elétrons entre átomos ligados covalentemente é na faixa dos fentossegundos (10-15 segundos) e que a MQ requer uma resolução temporal na mesma faixa dos fentossegundos. Ou seja, esses eventos estruturais em biomoléculas acontecem numa escala de tempo entre 1 milhão e 1 bilhão de vezes mais lenta do que os eventos modelados na MQ. Portanto, a utilização de métodos quânticos para simular grandes moléculas exigiria um tempo computacional altíssimo, tornando-a um método inviável para estudar a dinâmica conformacional de biomoléculas, o que vem a ser exatamente a razão pela qual essas moléculas são estudadas. Para esse fim, portanto, é necessário um método computacionalmente mais acessível ("mais barato"). 2 Mecânica clássica: simplicidade e rapidez Tendo em vista a dificuldade de lidar com o detalhamento quântico na descrição de grandes biomoléculas em grandes escalas de tempo, uma solução elegante foi retornar ao campo da mecânica clássica (newtoniana). Nessa descrição, os átomos são tratados como corpos perfeitamente esféricos e o modelo matemático para descrever a dinâmica dos átomos em moléculas se dá pela 2a Lei de Newton: →−Fi = mi→−a (2) Essa equação pode ser descrita usando o gradiente da energia potencial ϑ em função da variação da posição dri da partícula i. Então, teremos que: dϑ d→−ri = −mi d2→−ri dt2 (3) Assim, o átomo i de massa mi se desloca uma distância dri em um tempo dt. Dadas as posições e as velocidades iniciais, a integração dessa equação em função de dt para todos os átomos do sistema descreve a movimentação dos átomos em um dado sistema molecular ao longo do tempo, o que será devidamente abordada no capítulo XIII sobre Dinâmica Molecular. Considere, por exemplo, um átomo i qualquer em uma proteína enovelada, solvatada por água. Esse átomo realiza interações químicas com a sua vizinhança, seja ela composta por outros átomos da proteína ou do solvente, o que compõe sua energia potencial ϑi. É Campos de Força 3 possível calcular essa energia potencial através de uma decomposição em componentes energéticos menores relacionados à forma como o átomo i interage com seu entorno. A forma como separamos essas contribuições energéticas para calcular a energia potencial ϑ de cada átomo do sistema define o cerne do que chamamos função de energia potencial. 2.1 A função de energia potencial Ao longo dos últimos 40 anos, a forma das funções de energia potencial foram alvos de profundas reflexões sobre seus benefícios e limitações. Dessa forma, usaremos uma forma genética para demonstrar seus principais conceitos e impactos no cálculo energético em si, mas o leitor deve ter em mente que variações dessa forma são comumente encontradas. A energia potencial total (ϑtotal) de um átomo i em um determinado instante determina seu deslocamento no espaço. É possível calcular ϑtotal através de sua decomposição em energias potenciais relacionadas às ligações covalentes (ϑcovalente) e às interações à distância (ϑna˜o−covalente). O termo ϑcovalente é composto pela soma dos termos ϑligação, ϑângulação, ϑpróprios e ϑimpróprios, representando as energias potenciais relacionadas aos átomos ligados ao átomo i, enquanto ϑna˜o−covalente é composto pela soma de ϑeletrostático e ϑLennard−Jones, que representam as energias potenciais de interação que o átomo i realiza à distância com os átomos em seu entorno. ϑtotal = ϑcovalente + ϑna˜o−covalente (4) ϑtotal = ϑligação + ϑângulação + ϑpróprios + ϑimpróprios + ϑLennard−Jones + ϑeletrostático (5) ϑtotal = 1 2 kb(b − b0)2 + 1 2 kθ(θ − θ0)2 + 1 2 + 1 2 NN kξ(ξ − ξ0)2 + i j=i 4εij 12 σij rij − σij rij kϕ(1 + cos(nϕ + γ)) 6 N + i N qiqj j=i 4π 0Rij (6) Em geral, as energias potenciais são calculadas através de funções matemáticas utilizadas para a superfície energética em função da posição, angulação, torção ou distância dos átomos envolvidos na interação. 4 Polêto, M.D. 2.1.1 Potencial de estiramento de ligação: A energia potencial de uma ligação covalente é frequentemente descrita como um potencial de Morse, como mostrado na Figura 2, no qual é descrito o mínimo energético relacionado à distância de equilíbrio e a energia de dissociação dos átomos. Contudo, ligações covalentes raramente fogem dessa distância de equilíbrio em condições não-reativas. Portanto, uma simplificação matemática é usualmente aplicada na maioria das funções de potencial e ϑligação é calculada usando a Lei das Molas de Hooke (um potencial harmônico). Para facilitar a compreensão, basta pensar que dois carbonos (massas esféricas) são ligados entre si por uma mola, que oscila seu estiramento ao redor de uma distância de equilíbrio de 0.1530 nm, representando uma ligação covalente simples (Figura 2); O uso de um potencial harmônico impede a quebra de ligações covalentes, já que o demasiado afastamento dos átomos causa um grande aumento da energia potencial, o que traria muita instabilidade para a molécula. Para impedir isso, essas ligações são mantidas constantes na mecânica molecular. 2.1.2 Potencial de deformação angular: A energia potencial de angulação ϑângulação também é calculada utilizando a Lei das Molas de Hooke. Para compreender melhor, basta visualizar uma mola entre os átomos de hidrogênio numa molécula de água, oscilando seu tamanho e, consequentemente, oscilando o ângulo de equilíbrio em torno de 109.5◦. Essa simplificação é bem aceita por se adequar aos dados obtidos experimentalmente, os quais mostram que uma angulação oscila tal qual uma mola em torno de uma posição de equilíbrio. A parametrização de estiramentos de uma mola usada nos potenciais de ligação ou angulação é relativamente fácil de ser feita. Historicamente, ela foi realizada utilizando dados cristalográficos, dados de frequências vibracionais e, mais recentemente, cálculos de mecânica quântica. 2.1.3 Potencial de diedros próprios: Na estereoquímica, um diedro próprio é definido pela rotação da ligação simples entre 4 átomos subsequentes. Geometricamente falando, é a angulação formada entre o planos formado pelos átomos 1-2-3 e o plano formado pelos átomos 2-3-4. O exemplo mais clássico é a molécula de butano, na qual a rotação da ligação entre os carbonos 2 e 3 pode gerar angulações diedrais entre 0◦ e 360◦, gerando as conformações conhecidas como syn e anti (Figura 5). Por esse comportamento cíclico, o potencial de diedro próprio ϑpróprios é calculado utilizando uma função cosseno para modelar a rotação de uma ligação simples. Perceba que uma ligação química pode rotacionar com maior ou menor dificuldade (com maior ou menor barreira energética, respectivamente) dependendo de sua vizinhança química. Pelo mesmo motivo, um certo diedro tem preferências torcionais nos ângulos de menor energia (mínimos energéticos). A função cosseno utiliza as variáveis κφ, n e γ que, uma vez aplicadas, resultam em ângulos diedrais com energias máximas, mínimas e de diferentes amplitudes, o que chamamos de perfil torcional. Em geral, o perfil torcional de um diedro próprio é calculado na MM para padrões químicos específicos, uma vez que a vizinhança química impacta diretamente na localização dos mínimos energéticos e na altura das barreiras torcionais. Na prática, isso significa que os parâmetros calculados para um certo padrão químico não deveriam ser extrapolados para um padrão químico muito distinto. Historicamente, os parâmetros dos Campos de Força 5 Κφ [1+ cos(nφ + γ)] Figura 1: Esquema representativo da função de energia potencial de um campo de força. O termo de ϑcovalente é composto pelo somatório dos termos ϑligação, ϑângulação, ϑpróprios e ϑimpróprios, enquanto o termo de ϑna˜o−covalente é composto pelo somatório dos potenciais ϑeletrostático e ϑLennard−Jones (baseada em Serdyuk, Zaccai e Zaccai Leach). 6 Polêto, M.D. Figura 2: Esquerda: Representação da energia potencial de Morse e da energia potencial de Hooke (potencial harmônico), majoritariamente distintos para deformações maiores que 0.1 nm. Direita: energia potencial de ligação, no qual b representa a deformação da mola e Kb representa o quão rapidamente a energia aumenta em função de b. Figura 3: Energia potencial de deformação angular, no qual θ representa a deformação da mola e Kθ representa o quão rapidamente a energia aumenta em função de θ. Figura 4: Energia potencial de diedros próprios, baseado em uma função cosseno, na qual n está é o número de mínimos em um comprimento de onda, γ é o deslocamento de fase da função e Kφ está associado à amplitude da barreira energética relacionada à transposição entre mínimos. Campos de Força 7 perfis torcionais são calibrados utilizando como referência o perfil torcional da ligação em questão calculado por métodos quânticos (Figura 5). Geralmente, esse tipo de cálculo quântico é feito na ausência de moléculas de solvente devido ao custo computacional e, na prática, as energias relacionadas à torção de uma ligação química são calculadas no vácuo. Assume-se, portanto, que o impacto da presença do solvente nas energias será devidamente representado devido ao efeito aditivo. Figura 5: Sobreposição dos perfis torcionais da molécula de butano calculados utilizando a mecânica molecular e mecânica quântica. A calibração dos parâmetros da função de potencial de diedros próprios tendo em vista o perfil torcional calculado por métodos quânticos garante uma descrição acurada das conformações moleculares na mecânica molecular. 2.1.4 Potencial de diedros impróprios: Ainda na linha dos diedros, um diedro impróprio é definido pela angulação fora do plano entre 4 átomos não-subsequentes. A definição matemática é a mesma do diedro próprio. No contexto prático da mecânica molecular, um diedro impróprio não varia entre 0◦ e 360◦, uma vez que isso pertubaria a estabilidade molecular. Assim, ϑimpróprios também é calculado pela Lei das Molas de Hooke, tendo a angulação entre os planos formados pelos átomos 1-2-3 e 2-3-4 variando em torno de uma angulação de equilíbrio (Figura 6) Por exemplo, a molécula de acetona (CH3-C(=O)-CH3) é planar, e somente os termos de ligação e os termos de angulação não são suficientes para manter essa planaridade, o que torna necessário o uso de um termo adicional. Por isso, um diedro impróprio não pode ser rotacionado como um diedro próprio, uma vez que quanto mais distante do ângulo de equilíbrio, maior será a energia potencial. Na prática, isso significa que a determinação de diedros impróprios é útil para definir angulações fora do plano que não devem oscilar. São exemplos a planaridade de um anel aromático e a piramidalidade de um átomo sp3 tetraédrico. 2.1.5 Potencial eletrostático: Sabemos que átomos mais eletronegativos atraem elétrons com mais força do que os átomos menos eletronegativos, o que acaba resultando em uma diferença eletrônica entre eles e, portanto, atrações ou repulsões eletrônicas. Nos métodos quânticos, as atrações e repulsões são calculadas através da modelagem dos orbitais atômicos, o que vem a ser demasiadamente custoso em termos computacionais 8 Polêto, M.D. Figura 6: Energia potencial de diedros impróprios pode ser modelada através do potencial harmônico de Hooke, no qual χ representa a deformação do diedro em relação ao seu valor ótimo e Kξ representa o quão rapidamente a energia aumenta em função de ξ. para simular biomoléculas. Assim, uma adaptação se faz necessária para uma simulação mais rápida. Por isso, a energia potencial eletrostática ϑeletrostático pode ser calculada na MM utilizando a Lei de Coulomb: cada átomo contém uma carga parcial negativa ou positiva e se atraem ou repelem com energia potencial proporcional ao inverso da distância entre os átomos (Figura 7). O leitor mais atento perceberá que a aplicação da Lei de Coulomb exigirá que as cargas atômicas parciais sejam tratadas como cargas pontuais, o que entra em conflito com a percepção volumétrica dos átomos e seus orbitais no espaço. Assim, as filosofias dos campos de força podem variar significativamente na forma de lidar com essa transposição eletrônica da MQ para a MM. Alguns campos de força utilizam métodos quânticos para calcular a distribuição das cargas na superfície da molécula (também conhecida como a superfície de potencial eletrostático) e, à partir disso, derivam uma carga atômica pontual que a reproduza no raio atômico. Outros campos de força utilizam propriedades físico-químicas de líquidos orgânicos como alvos de calibração para seu conjunto de cargas. Nesse caso, o líquido orgânico é simulado e as cargas são ajustadas empiricamente até a adequada reprodução das propriedades experimentais de interesse. Em geral, não é factível gerar novas cargas atômicas parciais para uma macromolécula inteira devido à complexidade do cálculo. Para isso, tornou-se comum a parametrização de monômeros constituintes, seja calculando-as por métodos quânticos ou calibrandoas empiricamente, o que tornou o processo de parametrização muito mais simples e palpável, ao passo em que permitiu um enorme salto na utilização de campos de força. Utilizar a lei de Coulomb implica que a superfície de potencial eletrostático de um determinado átomo será isoelétrica. Na MM clássica, ϑeletrostático terá o mesmo valor em qualquer ponto da superfície de um átomo, o que impossibilita a descrição de superfícies anisotrópicas (com diferentes densidades eletrônicas) de átomos quando compondo moléculas, como é comumente visto em cálculos quânticos. 2.1.6 Potencial de Lennard-Jones: As interações eletrostáticas não são suficientes para modelar todas as interações não-covalentes. Outras forças são observadas experimen- Campos de Força 9 Figura 7: Energia potencial eletrostática segue a energia potencial de Coulomb, na qual cargas opostas produzem uma energia negativa e cargas similares produzem uma energia positiva, ambas proporcionais ao inverso da distância r. talmente, como é o caso das interações de van der Waals. Esse tipo de interação foi inicialmente observada no estudo de gases de argônio, no qual observou-se que a energia de interação de dois átomos é nula à uma distância infinita, mas à medida em que os átomos se aproximam, a energia diminui (se tornando atrativa) e passa por um valor mínimo (máxima atração) na distância de equilíbrio, aumentando rapidamente com a diminuição da distância, até que se torne nula à uma determinada distância e após torna-se repulsiva (Figura 8). Para fins de mecânica clássica, o potencial de Lennard-Jones ϑLennard−Jones é frequentemente aplicado para modelar esse tipo de interações interatômicas. Sua forma mais difundida é o potencial 6-12, que contém 2 parâmetros ajustáveis: o diâmetro de colisão σ (a distância interatômica para qual a energia é zero) e a amplitude do mínimo energético ε. Contudo, o cálculo de ϑLennard−Jones em um sistema é feito a cada par de átomos. Isso significa que cada combinação de átomos terá parâmetros σ e ε específicos para modelar sua interação. Assim, em um sistema com N tipos de átomos, um conjunto de N(N-1)/2 parâmetros serão necessários para descrever todos os tipos de interações interatômicas. Os parâmetros de Lennard-Jones podem ser obtidos de formas variadas. Por exemplo, alguns campos de força calibram esses parâmetros para reproduzir propriedades termodinâmicas de líquidos orgânicos. Em outros casos, os parâmetros podem ser calibrados para reproduzir energias de empacotamento cristalino. Perceba que a variedade de parâmetros de Lennard-Jones está diretamente ligada à capacidade de um campo de força de modelar diferentes tipos de biomoléculas, uma vez que o ambiente químico de um átomo pode mudar drasticamente de acordo com a biomolécula de interesse. Um exemplo são os átomos de oxigênio presentes nas hidroxilas de proteínas e os átomos de oxigênio de éster presentes em lipídeos, que possuem diferentes parâmetros na maior parte dos campos de força atuais para a sua descrição acurada. 10 Polêto, M.D. Figura 8: Energia potencial de Lennard-Jones, na qual σ representa a distância interatômica na qual a energia é zero e ε representa a amplitude do mínimo energético de interação. 3 O que é um campo de força? Define-se um campo de força como um conjunto de parâmetros calibrados que são aplicados à função de potencial para descrever energias e conformações moleculares em uma simulação. São também chamados de parâmetros topológicos, por definirem a configuração de uma molécula (e portanto, sua topologia). Contudo, a geração de novos parâmetros de Lennard-Jones, cargas atômicas parciais ou parâmetros torcionais frequentemente envolvem novos cálculos quânticos e um custo computacional extra e, muitas vezes, elevado. Assim, concede-se aos campos de força o atributo da modularidade, que consiste na parametrização de módulos (em geral, grupamentos químicos comuns ou fragmentos moleculares) e no uso desses módulos para descrição de moléculas mais complexas. Um exemplo comum é a parametrização de aminoácidos e seu uso para descrever a conformação de proteínas. Outro exemplo é uso de um mesmo conjunto de parâmetros de Lennard-Jones para átomos com vizinhanças químicas diferentes (solvente, grupamentos químicos vicinais, etc). Assume-se, nesse caso, que módulos bem calibrados exercem entre si um efeito aditivo que permite a descrição de biomoléculas mais complexas. Portanto, a modularidade de um campo de força, junto com sua diversidade de parâmetros, é o que o permite cobrir e descrever diferentes tipos de biomoléculas sem um elevado esforço de parametrização caso à caso. Obviamente, a qualidade desses parâmetros impactam diretamente na acurácia da predição das energias e conformações modeladas. Por isso, é comum que os campos de força sejam exaustivamente testados em busca de possíveis melhorias ou falhas desconhecidas. Dito isso, também é importante reconhecer que é impossível calibrar um modelo para que ele descreva todas as propriedades desejadas e, portanto, cada campo de força define um conjunto de propriedades para serem alvos de sua descrição. Por isso, não existem campos de força "melhores ou piores", mas sim campos de força mais ou menos adequados para descrever certas propriedades ou biomoléculas. Falaremos um pouco mais sobre os tipos de campos de força a seguir. Campos de Força 11 Figura 9: Termos topológicos observáveis no dipeptídeo de alanina regidos por funções de potenciais em um modelo AE. À cada instante dt, todos os potenciais são calculados em função da posição r dos respectivos átomos envolvidos, fornecendo assim as contribuições que constituem ϑtotal para aquele instante. 12 Polêto, M.D. 4 Níveis de resolução em campos de força O nível de resolução (ou a "granularidade") de um campo de força pode ser variável, permitindo a modelagem de sistemas de diferentes tamanhos a um baixo custo computacional. O "grão", nesse caso, se trata da partícula mínima descrita no campo de força. Até o momento, consideramos que cada átomo de um sistema é um "grão"tratado de forma explícita, ou seja, como um corpo perfeitamente esférico e de massa não-nula regido por parâmetros de potenciais energéticos. Esse tipo de campo de força de alta granularidade é conhecido como modelo de átomos-explícitos (AE). Contudo, parâmetros covalentes e não-covalentes escalonam rapidamente com o aumento do número de átomos de um sistema, o que cria certas desvantagens práticas, sejam elas na parametrização de novas moléculas ou no custo computacional para simular grandes sistemas. Assim, modelos de granularidade média incorporam o conceito de pseudoátomos, unindo 2 ou mais átomos em um único "grão"representativo, enquanto descrevem os demais átomos (em geral, átomos polares) explicitamente. Dessa forma, os ditos modelos de átomos-unidos (AU) ainda são considerados de resolução atomística, uma vez que permitem descrever as interações átomo-a-átomo mais comuns em sistemas biomoleculares. Ainda, existem os modelos de baixa granularidade (BG), nos quais a percepção individual do átomo já não é possível: o conceito de pseudoátomo é ampliado e aplicado à grupamentos químicas inteiros. Assim, nesses campos de força não é incomum um único aminoácido ser constituído de apenas dois ou quatro pseudoátomos, o que permite alcançar escalas de tempo muito maiores a um baixo custo computacional em troca da perda da resolução atomística (Figura 10). Figura 10: Níveis de granularidade de campos de força. Enquanto os modelos de granularidade alta (AE e AU) permitem descrever interações átomo-à-átomo, os modelos de baixa granularidade (BG) ampliam o conceito de pseudoátomo para grupamentos químicos inteiros. Campos de Força 13 Apesar das aparentes simplificações, os campos de força em cada um desses níveis de resolução exigem uma exaustiva parametrização para reprodução de propriedades experimentais observáveis. Dentro dessas classes, famílias de campos de força foram desenvolvidas para descreverem sistematicamente biomoléculas como proteínas, ácidos nucléicos, carboidratos e lipídeos. Falaremos um pouco mais de cada um desses níveis de resolução e as peculiaridades de cada família de campo de força. 4.1 Átomos-Explícitos No início da década de 1970, Martin Karplus e colaboradores em Harvard (EUA) tiveram acesso aos estudos iniciais de desenvolvimento de campos de força, na época com um formato de função de potencial próximo ao conhecido hoje. Em 1977, Karplus, Gelin e MacCammon publicaram o histórico trabalho “Dynamics of Folded Proteins”, considerado por muitos como a base dos estudos atuais de simulações de biomoléculas. Na época, a filosofia AU ainda era crucial para tratar sistemas simples devido à limitação do poder computacional. Por isso, a maior parte dos sistemas era simulado sem a presença de moléculas de solvente e com um potencial exclusivo para tratar ligações de hidrogênio. Alguns anos depois, um antigo pós-doutor do laboratório de Karplus chamado Paul Weiner se associou ao laboratório de Peter Kollman e, juntos, desenvolveram o campo de força ff84 (também na filosofia AU) e, dois anos depois, publicaram uma versão AE chamada ff86. Em 1988, William L. Jorgensen utilizou alguns parâmetros de Weiner, recalibrou os parâmetros de Lennard-Jones e as cargas atômicas parciais contidas no ff84 para reproduzirem propriedades físico-químicas de líquidos orgânicos, no intuito de aproximar as propriedades simuladas do ambiente de fase líquida. Com essas mudanças, os parâmetros de Jorgensen tornou desnecessária a presença de um potencial exclusivo para tratar ligações de hidrogênio, o que levou até a função de potencial conhecida hoje. A partir dessas bases, novos campos de força AE foram desenvolvidos desde então. 4.1.1 CHARMM Inicialmente lançado junto com o software de simulação de mesmo nome em 1983, os parâmetros da família CHARMM ("Chemistry at HARvard Macromolecular Mechanics") também tiveram sua origem na filosofia AU, e foram desenvolvidos inicialmente sob a supervisão de Martin Karplus, e eventualmente migraram para modelos AE. Atualmente, o campo de força CHARMM está sob supervisão de Alex MacKerell, na Universidade de Maryland (EUA). Os parâmetros de ligação e angulação do CHARMM foram retirados de estruturas cristalográficas de aminoácidos, mas a filosofia CHARMM inclui um termo extra na sua função potencial (Potencial Urey-Bradley) com objetivo de aproximar as angulações com os espectros vibracionais observados experimentalmente. Ainda, as cargas atômicas parciais foram derivadas empiricamente a partir da energia de interação de dímeros de pequenas moléculas orgânicas utilizando cálculos quânticos. As versões seguintes mantiveram a estratégia original, mas também implementaram cálculos quânticos mais robustos e modelos de água mais modernos. Após seu lançamento na versão CHARMM04, o desenvolvimento de novos parâmetros de aminoácidos e ácidos nucléicos levaram ao lançamento da versão do campo de força CHARMM22. Posteriormente, foi lançada a versão CHARMM36 com novos parâmetros de lipídeos e carboidratos, e os antigos revisados. No campo das moléculas orgânicas, 14 Polêto, M.D. um conjunto de parâmetros gerais foi desenvolvido e denominado CGen-FF (CHARMM General Force Field ), com o objetivo de descrever pequenos ligantes de forma compatível com a filosfia CHARMM. 4.1.2 AMBER Ainda que o ff86 seja considerado como parte da família, a versão ff94 é considerada por muitos o primeiro campo de força da família AMBER ("Assisted Model Building with Energy Refinement"). Ao longo dos anos, muitos esforços foram realizados no intuito de expandir o conjunto de parâmetros da família AMBER para descrição de mais biomoléculas, além de expandir o software de simulação de mesmo nome. Os campos de força AMBER foram os primeiros a sistematicamente derivarem as cargas atômicas parciais de seus átomos através de cálculos quânticos. Basicamente, a superfície de potencial eletrostático é calculada pelo método Hartree-Fock (HF) usando a base 6-31G* no vácuo e, posteriomente, são derivadas as cargas atômicas parciais fixadas no centro dos átomos. Além disso, seus parâmetros torcionais foram ajustados para reproduzir perfis torcionais calculados por MQ. Embora parâmetros de Lennard-Jones desenvolvidos por Jorgensen em 1988 tenham sido utilizados para o desenvolvimento do ff94, versões mais novas do campo de força AMBER contam com novos parâmetros para descrever uma maior gama de ambientes químicos para os átomos do campo de força, permitindo descrever uma maior variedade de biomoléculas. Além disso, uma versão geral do campo de força foi desenvolvida e denominada GAFF (General Amber Force Field) no intuito de descrever moléculas orgânicas simples ou pequenos ligantes de forma compatível com a filosofia AMBER. Devido à limitação computacional da época de lançamento do ff94, os parâmetros torcionais foram ajustados para reproduzir poucas conformações de baixa energia potencial, o que abriu espaço para o desenvolvimento de novas versões do campo de força com ajustes nos parâmetros diedrais de proteínas, ácidos nucléicos, carboidratos e lipídeos, como é o caso das versões ff99SB-ildn, ff-parmbsc1 e ff14SB. 4.1.3 OPLS-AA Também inspirado pelo ff84, o conjunto de parâmetros AU que Jorgensen desenvolveu em 1988 foi denominado OPLS ("Optimized Potentials for Liquid Simulations") devido aos seus parâmetros não-ligados serem calibrados para reproduzirem propriedades físico-químicas de líquidos orgânicos. Com o avanço do poder computacional, uma nova versão AE foi desenvolvida e chamada OPLS-AA. Em termos gerais, a calibração utilizando propriedades físico-químicas de líquidos orgânicos foi mantida utilizando ajustes empíricos nas cargas atômicas parciais e nos parâmetros de Lennard-Jones, enquanto os parâmetros de ligação e angulação do OPLSAA foram obtidos à partir do ff94. Ainda, novos parâmetros diedrais foram obtidos através de cálculos quânticos usando o método Hartree-Fock restrito (RHF) e base 6-31G*. Apesar de OPLS-AA ter sido claramente desenvolvido para descrever o aspecto energético de líquidos orgânicos, novas melhorias foram implementadas em suas versões mais recentes, como o OPLS2005 e OPLS2.1, com o intuito de descrever as conformações de rotâmeros de aminoácidos e, consequentemente, a estrutura tridimensional de peptídeos e proteínas. Atualmente, o OPLS3 é restrito para uso comercial dentro Campos de Força 15 do software Schrodinger, embora a versão OPLS-AA ainda seja mantida e distribuída gratuitamente sob supervisão de William L. Jorgensen, na Universidade de Yale (EUA). 4.2 Átomos-Unidos Fundamentalmente, todas as famílias de campo de força começaram como modelos AU, no intuito de diminuir o custo computacional. Somente com o avanço de hardware e da capacidade de processamento, algumas famílias migraram para tratar todos os átomos do sistema de forma explícita. Campos de força do tipo AU fazem uso do conceito de pseudoátomos para modelar grupamento de átomos. Um exemplo é tratar grupamentos metila (-CH3) como uma particula única de massa igual à metila e com parâmetros de Lennard-Jones calibrados para descrever as distâncias e energias de interação do grupamento. Uma vantagem imediata é que os parâmetros de ligação, angulação e diedrais envolvendo os hidrogênios da metila se tornam desnecessários, assim como as cargas atômicas parciais dos hidrogênios. Dessa forma, o uso de pseudoátomos simplifica o processo de parametrização por demandar a calibração de um número menor de termos. Além disso, outra vantagem do uso de pseudoátomos é que o cálculo da função de potencial levará em conta um menor número de átomos e, consequentemente, de termos. Assim, uma biomolécula simulada utilizando modelos AU exigirá um menor tempo de computação do que se tivesse sido simulada com modelos AE. 4.2.1 GROMOS Originalmente, o campo de força GROMOS foi desenvolvido junto com um software de simulação de mesmo nome (GROningen MOlecular Simulation), sob supervisão de Wilfred van Gunsterem e Herman Berendsen em Groningen, na Holanda. Contudo, seu conjunto de parâmetros para proteínas se expandiu para outros softwares, o que levou à sua expansão ao longo dos anos para descrever carboidratos, lipídeos e ácidos nucléicos (ver Tabela 1). O campo de força GROMOS utiliza a filosofia de pseudoátomos para grupamentos CH1, CH2 e CH3 alifáticos (hidrogênios polares e aromáticos são mantidos explícitos). A justificativa para isso é que esses hidrogênios realizam pouca ou quase nenhuma interação eletrostática devido ao seu caráter apolar. Por isso, os parâmetros de Lennard-Jones desses pseudoátomos são calibrados para descreverem as interações hidrofóbicas que eles realizam com a vizinhança. O processo de parametrização da filosofia GROMOS é baseado nas propriedades físico-químicas de líquidos orgânicos, como introduzida em sua versão 43A1. Em suma, os termos topológicos de uma molécula são empiricamente definidos e usados para simular as propriedades físico-químicas de um líquido orgânico, as quais são comparadas com as respectivas propriedades experimentais, buscando o menor erro absoluto possível. Em teoria, o objetivo dessa abordagem é aproximar o comportamento em fase condensada das moléculas simuladas com o comportamento observado experimentalmente. Historicamente, os parâmetros GROMOS foram calibrados usando como alvo propriedades físico-químicas como densidade, entalpia de vaporização e energia-livre de solvatação, além de terem sido testados na descrição de distâncias interprótons obtidas por ressonância magnética nuclear. Ao longo dos anos, o campo de força GROMOS foi expandido para descrever variados tipos de biomoléculas, como carboidratos, lipídeos, ácidos nucléicos, como nas versões 16 Polêto, M.D. mais recentes GROMOS53A6, GROMOS54A7 e GROMOS54A8. Atualmente, a geração de parâmetros topológicos para pequenas moléculas orgânicas ou ligantes pode ser feita automaticamente pelo servidor ATB (Automated Topology Builder), uma vez que não existe uma ferramenta oficial feita pela equipe de desenvolvedores do GROMOS para tal tarefa. 4.3 Modelos de Baixa Granularidade Dois anos antes do famoso trabalho de Karplus simulando a proteína inibidora da tripsina pancreática bovina em 1977 em modelos de alta granularidade, Michael Levitt e Arieh Warshel simularam a mesma proteína baseada em modelos de baixa granularidade. O trabalho consistiu na tentativa de resolver o paradoxo de Levinthal ao descrever a dinâmica de enovelamento de uma proteína pequena e, embora não tenha descrito o processo completo em detalhes, contribuiu fortemente para a compreensão dos efeitos de empacotamento e de interações par a par no enovelamento de proteínas. Em 2013, Levitt, Warshel e Karplus foram laureados com o Prêmio Nobel de Química pela enorme contribuição no campo da bioquímica estrutural de biomoléculas. Já na época, o limitado poder computacional permitia poucas iterações de simulação. Com o avanço do poder computacional e a grande difusão dos modelos AE e AU na comunidade científica, os modelos granulares também se tornaram cada vez mais populares por fazerem uso da sua maior vantagem: permitir a simulação de grandes complexos moleculares em escalas de tempo mais longas à um custo computacional acessível. Para isso, os pseudoátomos tomam os lugares de conjuntos de átomos ou funções químicas inteiras no intuito de diminuir o custo computacional. Para uma descrição acurada, os parâmetros que regem esses pseudoátomos são calibrados para descrever energias de interação, estruturas tridimensionais observáveis ou outras propriedades experimentalmente mensuráveis. 4.3.1 Martini Talvez um dos campos de força de filosofia BG mais difundidos atualmente, Martini recebeu o nome por uma alusão à um apelido da cidade de Groningen (Holanda), na qual o campo de força foi desenvolvido e ainda é mantido pelo grupo de Siewert-Jan Marrink na Universidade de Groningen. Em termos gerais, os pseudoátomos na família Martini foram categorizados em: Q (carregados), P (polares), N (não-polares) e C (apolares). Cada uma dessas categorias se ramifica em alguns níveis, totalizando todos os diferentes tipos de pseudoátomos possíveis para descrever diferentes tipos de biomoléculas. Em 2004, sua versão inicial continha apenas 9 tipos de pseudoátomos, os quais foram expandidos para 18 em sua versão Martini 2.0 em 2007. Alguns anos depois, o campo de força foi expandido para descrever também carboidratos e ácidos nucléicos. Enquanto os termos de ligação, angulação, diedros próprios e impróprios foram derivados de simulações de estruturas cristalográficas usando modelos AE, as cargas parciais dos pseudoátomos foram empiricamente ajustadas para a reprodução da energialivre de solvatação, energia-livre de vaporização e energia-livre de partição água/octanol experimentais. Nos modelos BG, a simplificação da granularidade traz consigo uma diminuição dos graus de liberdade de movimento das moléculas (menos partículas que vibram, e menos ligações que torcionam). Isso leva a uma subestimação da entropia calculada, o que Campos de Força 17 requer uma redução da contribuição dos termos entálpicos como forma de compensação, permitindo uma correta descrição da energia-livre. Contudo, as componentes entálpicas e entrópicas, isoladamente, podem não apresentar valores absolutos acurados. Atualmente, o grupo liderado por Marrink também tem desenvolvido abordagens de simulações híbridas, nas quais parte do sistema é tratada por modelos BG (em geral, solvente ou bicamadas lipídicas), enquanto a outra parte é descrita por modelos AE. Une-se, assim, a diminuição do custo computacional dos modelos de baixa granularidade com a resolução atomística dos modelos de alta granularidade. 4.3.2 Sirah O Sirah (Southamerican Initiative for a Rapid and Accurate Hamiltonian) é uma iniciativa sulamericana e tem sido desenvolvida no Laboratorio de Simulaciones Biomoleculares, coordenado por Sergio Pantano, no Instituto Pasteur em Montevidéu (Uruguai). A filosofia de desenvolvimento do Sirah utiliza uma abordagem de parametrização diferente das empregadas nos campos de força abordados até aqui. Para esses, os parâmetros são calibrados para descreverem propriedades termodinâmicas de líquidos orgânicos (GROMOS, OPLS e Martini) ou energias de interação de fragmentos moleculares que são utilizados como módulos de polímeros ou biomoléculas complexas (CHARMM e AMBER). Os desenvolvedores do Sirah, contudo, usaram como alvo de calibração algumas propriedades estruturais e energéticas de biomoléculas complexas, reforçando o interesse dos modelos BG nas estruturas terciárias e quaternárias de biomoléculas. Por exemplo, as cargas parciais do esqueleto proteico foram derivadas para reproduzir a superfície de potencial eletrostático de uma hélice-α (seu macrodipólo). Desde seu lançamento em 2014 com um conjunto de parâmetros pra descrever proteínas, o Sirah foi expandido para descrever também lipídeos e ácidos nucléicos, totalizando 59 tipos de pseudoátomos disponíveis. Recentemente, o campo de força passou por uma revisão de parâmetros, o que levou ao lançamento da versão Sirah 2.0. Tal qual o Martini, Sirah também suporta simulações híbridas do tipo AE/BG, além de recentemente ter sido empregado conjuntamente com modelos de ainda menor granularidade (Supra-BG) em simulações híbridas. 5 Escolhendo um campo de força É comum que um assunto tão teórico cause muita abstração e pouca noção prática. Por isso, o leitor pode estar se perguntando: "Qual campo de força eu escolho?", ou, na prática, "Qual a diferença entre o CHARMM e o AMBER?". De fato, decidir qual conjunto de parâmetros utilizar em uma simulação não é uma tarefa que deva ser subestimada ou realizada de forma displicente, uma vez que a escolha equivocada de um campo de força só se mostrará como tal após o término da simulação, o que pode levar semanas ou meses. De forma geral, 2 dicas podem ser úteis na hora de escolher qual campo de força utilizar: 1 - "Que nível de resolução minha simulação precisa ter?" - Aqui, a escolha é em termos das filosofias de alta granularidade (AE ou AU) ou de baixa granularidade (BG). Uma simulação que tenha por objetivo observar interações átomo a átomo, como pontes salinas e ligações de hidrogênio ou até mesmo movimentos de estrutura secundária, 18 Polêto, M.D. Tabela 1: * Família Tipo de molécula Versão Ano Nível de resolução Aminoácidos CHARMM4 1983 AU CHARMM19 1985 AE CHARMM22 1992 AE Aminoácidos, lipídeos, ácidos nucléicos e carboidratos CHARMM36 2012 AE CHARMM Acidos nucléicos CHARMM22 1995 AE CHARMM27 2000 AE Lipídeos (em CHARMM36) 2010 AE Moléculas orgânicas CGen-FF 2010 AE Carboidratos (em CHARMM36) 2011 AE Aminoácidos, ácidos nucléicos ff84 1984 AU ff86 1986 AE ff94 1994 AE ff99 1999 AE ff99SB 2006 AE AMBER Moléculas orgânicas ff14SB GAFF 2014 2004 AE AE Lipídeos GAFFlipid 2012 AE Lipid14 2014 AE Carboidratos GLYCAM_03 1995 AE GLYCAM2000 2001 AE GLYCAM06 2007 AE Moléculas orgânicas, aminoácidos OPLS 1988 AU OPLS-AA 1996 AE OPLS-AA/L 2001 AE OPLS OPLS2005 OPLS2.1 2005 2012 AE AE OPLS3 2016 AE Carboidratos OPLS-AA 1997 AE Ácidos nucléicos OPLS-AA/M 2019 AE Aminoácidos 26C1 1982 AU Aminoácidos, ácidos nucléicos 37C4 1984 AU 43A1 1996 AU Aminoácidos, lipídeos e ácidos nucléicos 45A3 2001 AU 53A5/53A6 2004 AU 54A7 2011 AU GROMOS 54A8 2012 AU Lipídeos (em 54A7) 2009 AU Moléculas orgânicas ATB1.0 2011 AU ou AE ATB2.0 2014 AU ou AE ATB3.0 2018 AU ou AE Carboidratos 53A6CARBO 53A6GLYC 2010 2012 AU AU 56A6CARBO_R 2016 AU Aminoácidos Sirah 1.0 2014 BG Lipídeos Sirah 1.0 2017 BG SIRAH Ácidos nucléicos (DNA) Aminoácidos, lipídeos e ácidos nucléicos Sirah 1.0 Sirah 2.0 2017 2018 BG BG Lipídeos Martini 1.0 2004 BG Lipídeos Martini 2.0 2007 BG MARTINI Proteínas Carboidratos Martini 2.0 Martini 2.0 2008 2009 BG BG Ácidos nucléicos (DNA) Martini 2.0 2015 BG Ácidos nucléicos (RNA) Martini 2.0 2017 BG Campos de Força 19 deve usar a filosofia AE ou AU, pois são as únicas que possuem resolução atomística para observar tais eventos. Contudo, uma simulação que tenha por objetivo avaliar as estruturas terciárias e quaternárias de uma biomolécula ou que simule sistemas contendo muitos átomos pode optar por modelos BG, que permitirão simular escalas de tempo maiores (nas quais grandes movimentos moleculares ocorrem) a um menor custo computacional. 2- "Quais tipos de biomoléculas meu sistema contém?" - Os últimos 40 anos de simulação de biomoléculas produziu um vasto número de trabalhos científicos com aplicações e teste dos parâmetros desenvolvidos ao longo desse tempo, o que inevitavelmente revelou deficiências dos campos de força em descrever certos tipos de biomoléculas ou determinadas condições específicas. Ainda que muitas correções tenham sido propostas e implementadas nas versões mais recentes de cada campo de força, limitações ainda existem e são alvos para melhorias ainda hoje. Por isso, mesmo que grande parte dos campos de força atualmente possuam parâmetros para descrever as quatro principais biomoléculas (proteínas, carboidratos, lipídeos e ácidos nucléicos), alguns são mais acurados do que outros para cada tipo de biomolécula. Para sistemas simples, como proteínas solúveis em água, todos os campos de força abordados nesse capítulo produzem resultados aceitáveis, embora não necessariamente reprodutíveis entre si (Para ler sobre reprodutibilidade de simulações, visite o capítulo XIII). O leitor encontrará na literatura científica diversos trabalhos usando cada um dos campos de força que darão suporte para a sua escolha. Contudo, sistemas contendo as demais biomoléculas, como proteínas em membrana e glicosiladas, ou até mesmo DNAs ou RNAs, devem ser cuidadosamente checados quanto às escolhas frequentes utilizadas na comunidade científica. 6 Leitura recomendada • LEACH, A. R. Molecular modelling : principles and applications. [S.l.]: Longman, 1996. 595 p. ISBN 9780582239333. • DAUBER-OSGUTHORPE, P.; HAGLER, A. T. Biomolecular force fields: where have we been, where are we now, where do we need to go and how do we get there? Journal of Computer-Aided Molecular Design, Springer International Publishing, p. 1–71, nov 2018. ISSN 0920-654X. • RINIKER, S. Fixed-Charge Atomistic Force Fields for Molecular Dynamics Simulations in the Condensed Phase: An Overview. Journal of Chemical Information and Modeling, American Chemical Society, v. 58, n. 3, p. 565–578, mar 2018. ISSN 1549-9596. • KARPLUS, M.; WEAVER, D. L. Protein-folding dynamics. Nature, v. 260, n. 5550, p. 404–406, apr 1976. ISSN 0028-0836. • WEINER, S. J. et al. An all atom force field for simulations of proteins and nucleic acids. Journal of Computational Chemistry, John Wiley Sons, Ltd, v. 7, n. 2, p. 230–252, apr 1986. ISSN 01928651. • JORGENSEN, W. L.; TIRADO-RIVES, J. The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. Journal of the American Chemical Society, American Chemical Society, v. 110, n. 6, p. 1657–1666, mar 1988. ISSN 0002-7863. 6 Discussão geral 156 “Science is a way of thinking much more than it is a body of knowledge.” Carl Sagan 6.1 A estratégia de parametrização Até o momento, o grupo de desenvolvimento do campo de força GROMOS não produziu um protocolo oficial para parametrização sistemática de pequenos ligantes com foco no desenvolvimento de fármacos, ou uma ferramenta automatizada para isso, tal qual GAFF [172] ou CGenFF [160]. No intuito de preencher essa lacuna, Malde et al. [174] desenvolveram o servidor Automated Topology Builder (ATB) para produzir topologias compatíveis com o campo de força GROMOS de forma sistemática e automatizada. Em sua estratégia, o usuário fornece a estrutura do ligante, e os termos topológicos de ligação, angulação e diedros impróprios são derivados a partir de cálculos quânticos, enquanto termos diedrais são inferidos usando os parâmetros já existentes em GROMOS54a7 [166]. Analogamente, parâmetros de Lennard-Jones são inferidos usando como base a conectividade dos átomos. Cargas atômicas parciais são derivadas à partir da superfície de potencial eletrostático calculado cálculos quânticos do tipo B3LYP/6-31G* em conjunto com modelo de solvatação implícita PCM [262, 263]. Contudo é preciso destacar que a estratégia de derivação de cargas atômicas parciais empregada pelo servidor ATB não se utiliza do atributo de transferabilidade de seus grupos de carga previamente calculados e calibrados, mas sim, realiza um novo cálculo quântico para derivar novas cargas parciais. Além do custo computacional extra requerido, as cargas proveniente dessa abordagem são sensíveis à conformação do ligante que, por sua vez, é sensível à presença explícita do solvente, o que dificulta a derivação sistemática de um conjunto de cargas parciais para moléculas com alto grau de flexibilidade. Em contrapartida, a calibração de grupos funcionais ou fragmentos moleculares mais rígidos e seu uso para a construção topológica de ligantes flexíveis pode caracterizar uma vantagem metodológica. Com isso em mente, o trabalho reportado no Capítulo I dessa tese teve como foco o estabelecimento de um protocolo de parametrização de pequenos ligantes baseado na filosofia GROMOS, fazendo uso dos atributos de modularidade de grupos funcionais Capítulo 6. Discussão geral 157 ou de fragmentos moleculares. Em particular, nosso estudo utilizou, como prova de conceito, anéis aromáticos mais comumente encontrados em fármacos devido ao seu já conhecido uso extenso [257, 264, 265]. Como demonstrado em Polêto et al. [185] (e reproduzido na Tabela 4), a estratégia de parametrização desenvolvida pelos autores forneceu parâmetros topológicos de anéis aromáticos capazes de competir ou superar GAFF, OPLS-AA e 2016H66 na descrição de propriedades físico-químicas de líquidos orgânicos, sugerindo sua robustez. Uma vantagem da estratégia é a possibilidade de extrapolação dos parâmetros dos grupos funcionais calibrados para novos heterociclos aromáticos. Nesse caso, usa-se o módulo já parametrizado em conjunto com o algoritmo descrito na seção 4.2 para a derivação de novas cargas parciais, utilizando como base o vetor momento de dipolo proveniente de cálculos quânticos. Tendo em vista a validação da estratégia de parametrização empregada, o trabalho de Polêto et al. [185] também visa a caracterização de propriedades biológicas dinâmicas das moléculas estudadas em solução aquosa. Assim, foram calculadas propriedades derivadas das ligações de hidrogênio entre os heteroátomos dos anéis aromáticos e as moléculas de água, como número médio de ligações de hidrogênio (AverHB), seu tempo de residência (τHB), sua meia-vida (lif etimeHB), energia livre de rompimento dessas ligações (∆GHB) e ocupância ao longo da simulação (P ercent). Ainda, os efeitos de substituições vicinais aos heteroátomos foram avaliados em relação ao seus impactos nas interações com o solvente, permitindo compreender melhor o papel da dinâmica do solvente na descrição dessas interações, além de criar um verdadeiro catálogo de informações de relevância biológica e termodinâmica que pode ser utilizado na otimização de compostos-líderes. Tendo em vista o comportamento dessas interações com o solvente, talvez seja possível extrapolar esses perfis energéticos também para a avaliação de modelos farmacofóricos. Um impacto direto do uso de modelos bem calibrados para a descrição de propriedades físico-químicas em fase condensada é uma melhor descrição de propriedades termodinâmicas desses fragmentos, como observado por Jorgensen e Tirado-Rives [157]. Nesse sentido, os campos de força da filosofia GROMOS possuem uma vantagem estratégica frente aos demais campos de força [165,185] por ser parametrizado utilizando propriedades como entalpia de vaporização e energia livre de solvatação. Em especial, as energias de interações intra e intermoleculares - e, consequentemente, a entalpia - são sensíveis às cargas atômicas parciais e aos parâmetros de Lennard-Jones tanto do ligante quanto da proteína [175, 266], e a adequada calibração desse conjunto de parâmetros é determinante para a acurácia na descrição das energias de interação. Por Capítulo 6. Discussão geral 158 Tabela 4 – Desvio médio entre propriedades físico-químicas experimentais e simuladas de anéis aromáticos avaliadas no trabalho de Polêto et al. [185]. Valores referentes à GAFF e OPLS-AA foram obtidos em Caleman et al. [168] e 2016H66 foram obtidos em Horta et al. [184]. Densidade (ρ) em g/cm3, entalpia de vaporização (∆Hvap) em kJ/mol, coeficiente de expansão térmica (αP ) em 10-3/K, compressibilidade isotérmica (κT ) em 1/GPa, constante dielétrica (ε), Capacidade calorífia isobárica clássica (Cpcla) em J/mol×K. Propriedades ρ ∆Hvap C pcla αP κT ε Campo de força Polêto et al. [185] 2016H66 GAFF OPLS-AA Polêto et al. [185] 2016H66 GAFF OPLS-AA Polêto et al. [185] 2016H66 GAFF OPLS-AA Polêto et al. [185] 2016H66 GAFF OPLS-AA Polêto et al. [185] 2016H66 GAFF OPLS-AA Polêto et al. [185] 2016H66 GAFF OPLS-AA No amostral 42 6 40 40 42 6 40 40 42 6 37 37 42 6 40 40 42 6 40 40 42 6 29 33 Desvio médio 0.008 0.016 -0.008 0.001 1.514 2.257 2.298 3.243 88.201 98.712 133.884 129.397 0.146 0.171 0.224 0.155 0.046 0.276 0.054 -0.016 -4.523 -2.217 -4.254 -4.564 Desvio padrão 0.051 0.019 0.045 0.025 4.457 6.758 5.419 5.216 33.440 35.232 40.225 35.330 0.210 0.148 0.220 0.210 0.500 0.279 0.150 0.130 5.650 2.515 2.740 5.600 Coeficiente R 0.92 0.99 0.93 0.98 0.96 0.96 0.88 0.90 0.77 0.63 0.84 0.91 0.82 0.91 0.58 0.64 0.70 0.71 0.77 0.78 0.65 0.89 0.97 0.72 outro lado, e tendo em vista a flexibilidade de ligantes e a variação da sua superfície de potencial eletrostático em função do contexto químico (acoplado à proteína, livre em solução ou em diferentes estados conformacionais), mesmo parâmetros bem calibrados podem ser limitados pelo contexto newtoniano da DM, ou da falta de descritores de polarização em simulações clássicas [149, 266]. Dessa forma, o trabalho de Polêto et al. [185] visa também fornecer um conjunto de parâmetros calibrados de anéis aromáticos comumente encontrados em fármacos, no intuito de melhorar a descrição das energias de interação entre receptor e ligante ou entre o ligante e o solvente ao redor utilizando a filosofia da mecânica molecular clássica. Espera-se que uma correta descrição das entalpias envolvidas nessas interações proporcione uma maior compreensão dos modos de ligação ao receptor-alvo e auxilie Capítulo 6. Discussão geral 159 no desenho de novos fármacos. Ainda, uma acurada descrição dessas interações tem papel fundamental na dinâmica do ligante livre em solução, influenciando a magnitude e meia-vida dessas interações e, consequentemente, das conformações obtidas, permitindo uma amostragem conformacional com maior grau de confiança. 6.2 Amostragem conformacional e metodologia de análise Do ponto de vista termodinâmico, o custo entrópico para a formação do complexo receptor-ligante é diretamente correlacionado com as populações conformacionais de um ligante existentes no meio biológico: quanto mais similares elas forem à conformação complexada, menor será o custo entrópico para a complexação. Em termos matemáticos, podemos partir da equação 1.6 e assumir que que SP L = SPcomplexado + SLcomplexado e, portanto, a similaridade conformacional entre o ligante livre em solução e em seu estado complexado faz com que SL ≈ SLcomplexado e portanto menor será o ∆Scomplexação referente ao ligante. Dessa forma, o estudo da dinâmica conformacional de moléculas bioativas em meio biológico se faz de grande importância para uma melhor compreensão termodinâmica do processo de formação de complexos receptor-ligante. Por isso, o Capítulo II dessa tese teve como corpo de estudo a caracterização estrutural de chalconas e flavonóides e o mapeamento de sua dinâmica conformacional em solvente aquoso e orgânico. Para isso, o protocolo empregado na calibração das cargas atômicas parciais fez uso do conhecimento agregado no Capítulo I (e descrita na seção 4.2 e 4.6) e os termos torcionais necessários para descrever essas moléculas foram parametrizados utilizando cálculos do tipo MP2/6-31G* [185], no intuito de conservar preferências conformacionais sensíveis ao efeito das ressonâncias presentes nos anéis aromáticos, as quais só poderiam ser levadas em consideração métodos quânticos. É importante destacar que, diferentemente das torções dos anéis aromáticos que possuem alta flexibilidade e que dificilmente impactam nas propriedades físico-químicas simuladas no Capítulo I, a acurácia da calibração das torções em moléculas mais complexas impacta diretamente na adequada descrição de suas populações conformacionais [266] e, consequentemente, na entropia do ligante livre em solução [143]. A simulação das moléculas em solvente orgânico nos permitiu realizar uma comparação cruzada das distâncias interprótons obtidas experimentalmente por dados de NOESY e as calculadas ao longo da trajetória, o que angariou maior confiança na descrição das conformações simuladas. Assim, as mesmas moléculas foram simuladas em ambiente aquoso no intuito de caracterizar as populações conformacionais obtidas. O aumento do número de torções em uma molécula rapidamente aumenta Capítulo 6. Discussão geral 160 seu grau de flexibilidade e, portanto, o número de conformações estocasticamente possíveis. Para pequenos ligantes, as transições conformacionais entre cada população conformacional ocorrem em escalas de tempo muito pequenas para serem adequadamente observadas experimentalmente sem o uso de baixas temperaturas para diminuir a energia cinética disponível para transpor barreiras energéticas [147, 148]. Não raro, os sinais obtidos por experimentos de RMN-3D são associados às conformações mais abundantes, devido à sua maior meia-vida em solução [139]. Dessa forma, a modelagem computacional se torna uma poderosa ferramenta capaz de descrever esses eventos com resolução atômica e em escalas de tempo adequadas para a devida caracterização. Um desafio em particular, no entanto, é inferir quanto tempo de simulação é suficiente para uma amostragem conformacional adequada. Essa questão tem sido mais frequentemente levantada para moléculas com maiores graus de liberdade, tais como proteínas e peptídeos [267–270], mas sua fundamentação não exclui pequenos ligantes: ainda que certos mínimos energéticos conformacionais sejam visitados ao longo da simulação, a confiabilidade de inferir sobre a abundância relativa de cada conformação em solução está diretamente associada à existência de múltiplos eventos de transição conformacional. Em outras palavras, a inferência sobre as abundâncias conformacionais de pequenos ligantes é limitada pela existência de ergodicidade no sistema, ou seja, que todos os estados conformacionais sejam possíveis de serem visitados. Para atacar o desafio de identificar e quantificar diferentes populações diedrais, uma ferramenta computacional foi desenvolvida em parceria com o Laboratório de Bioinformática Estrutural e Computacional no Instituto de Informática da UFRGS (anexo B.3), de nome ConfID [143]. A ferramenta faz uso do conceito de que cada conformação molecular é resultado de determinados ângulos diedrais e, portanto, a combinação de diferentes ângulos diedrais produzem as populações conformacionais possíveis (Figura 12). Ainda, ConfID rastreia e quantifica eventos de transição diedral ou conformacional, permitindo inferir sobre a convergência da amostragem conformacional simulada. Para o caso das chalconas estudados em [143], simulações de 1,0 µs foram realizadas e as abundâncias diedrais convergiram, em média, aos 0,75 µs, enquanto o número de populações conformacionais diferentes alcançou convergência, em média, aos 0,2 µs. Para o caso de flavonóides, devido à sua menor flexibilidade, abundâncias diedrais e o número de populações conformacionais convergiram mais cedo. A confirmação da ergodicidade nas trajetórias calculadas nos permitiu identificar e quantificar as populações conformacionais mais abundantes ao longo da trajetória de uma simulação, bem como identificar o impacto que substituições comuns em chalconas Capítulo 6. Discussão geral 161 Figura 12 – Esquema representativo da ferramenta ConfID. Distribuições diedrais e seus respectivos ângulos em função do tempo são fornecidos como inputs (A), populações diedrais são identificadas (B) e as populações conformacionais (tupla única de populações diedrais) são quantificadas (porcentagem dentro dos nós), juntamente com eventos de transições conformacionais (porcentagens em vermelho), as quais são listadas e plotadas como grafos (C). e flavonóides podem exercer em suas conformações preferenciais, o que tende a ser dificilmente intuído apenas à partir da representação bidimensional dessas moléculas. Ainda, foi possível avaliar como a presença e o tipo de moléculas de solvente interagindo explicitamente com o soluto podem alterar significativamente a abundância relativa de suas populações conformacionais. Nos casos estudados em Arantes et al. [143], a existência de ligações de hidrogênio intramoleculares ou até mesmo interações mediadas por moléculas de água foram fatores comumente observados que desempenharam um papel crucial na mudança da abundância relativa de certas conformações, propriedades essas que são de difícil previsão a priori. Mais especificamente, a presença de resíduos de carboidratos ligados à chalconas e flavonóides aumentou o número de conformações subpopuladas quando em solução aquosa, indicando que a interação dessas porções das moléculas com o solvente são determinantes para a preferência conformacional do ligante. Em geral, pequenos ligantes possuem escassez de átomos passíveis de sinalização de NOESY, o que dificulta a sua caracterização estrutural [139]. Ainda, os sensores utilizados nas técnicas de RMN-3D ainda não possuem sensibilidade para detectar conformações moleculares com baixa meia-vida, motivo pelo qual o uso de baixas temperaturas pode ser acoplado à técnica, permitindo a melhor separação e, consequentemente, uma melhor detecção das populações conformacionais [147, 148]. Nesse sentido, a ferramenta ConfID permitiu a inferência de uma variedade de informações estruturais e cinéticas de difícil acesso para ligantes. Para as chalconas e flavonóides estudados em Arantes et al. [143], a ferramenta foi capaz de assinalar corretamente até 98% das conformações geradas para os ligantes com distribuições diedrais bem definidas (sem Capítulo 6. Discussão geral 162 sobreposição de distribuições). Para casos onde o limite entre as distribuições diedrais era mal-definido, nosso método ainda foi capaz de assinalar 77% das conformações amostradas. Ainda, uma das vantagens no uso de ConfID é a fácil identificação de conformações raras de baixa meia-vida (com abundância relativa abaixo de 1%), algo que métodos de clusterização ou de similaridade dificilmente teriam sensibilidade para identificar. Por outro lado, a amostragem conformacional avaliada é diretamente dependente da acurácia dos parâmetros topológicos utilizados, mais especificamente, dos potenciais torcionais, das cargas atômicas parciais e dos parâmetros de Lennard-Jones. Figura 13 – Como exemplo, uma caracterização conformacional de um flavonóide simulado em solvente orgânico. A conformação da molécula em verde é maioritária (61%), enquanto a conformação em azul é minoritária (35%). A identificação das populações conformacionais possíveis de ligantes e suas quantificações em solução permitem inferir sobre as suas respectivas disponibilidades para complexação com o receptor-alvo em meio biológico. Das aplicações possíveis para os dados levantados aqui, podemos citar os métodos de ancoragem molecular, os quais são famosos por buscarem conformações de menor energia de ligantes acoplados ao seu receptor-alvo, mas comumente não possuem parâmetros torcionais específicos para cada tipo de ligação [32, 37, 271]. Em teoria, as abundâncias diedrais de um ligante ou de seu esqueleto obtidas em solução podem ser utilizadas para filtrarem conformações energeticamente desfavoráveis, especialmente no caso de ligantes com múltiplos graus de liberdade. Ainda, ConfID também pode ser aplicado à simulações de complexos receptor-ligante, no intuito de identificar as conformações do ligante atingidas ao longo da trajetória. É preciso lembrar, porém, que a verificação de ergodicidade do sistema, nesse caso, é mais complexa, uma vez que a dinâmica conformacional do ligante está estritamente ligada à dinâmica conformacional do seu receptor-alvo. Nesses casos, inferências sobre abundâncias conformacionais ou a quantificação dos eventos de Capítulo 6. Discussão geral transição devem ser encarados com cuidado. 163 6.3 Dos insights biológicos aos insights termodinâmicos Os resultados mostrados no Capítulo III são frutos do amadurecimento metodológico adquirido ao longo dos capítulos anteriores, desde a parametrização do ligante sintético estudado até sua caracterização conformacional, permitindo o estudo de sua dinâmica conformacional em solução e a inferência do mecanismo de seu reconhecimento molecular pelo receptor-alvo. O ligante em questão é denominado PIK-75, inibidor da cinase glicogênio sintase GSK-3β. Sua estrutura cocristalizada com a GSK-3β foi obtida em parceria entre a Universidade Federal do Rio de Janeiro, a Universidade Federal do ABC e Universidade de Dortmund, juntamente com sua estrutura monocristalizada. Uma vez em posse das conformações do ligante no seu estado complexado ao receptor e também cristalizado sob efeito de empacotamentos cristalinos, o nosso foco no trabalho de [249] foi avaliar a dinâmica conformacional do ligante em solução para inferir mais informações sobre seu reconhecimento molecular. A parametrização das cargas atômicas parciais e das ligações torcionais do ligante foram feitas de acordo com as seções 4.2, 4.6 e 4.4 e o ligante foi simulado em solvente aquoso por 1 µs para sua devida caracterização conformacional. Como discutido em Tesch et al. [249], a avaliação do RMSD das conformações de PIK-75 visitadas na trajetória usando como referência as estruturas cristalográficas complexada e do monocristal permitiu a identificação de conformações muito similares à conformação complexada (RMSD ≈ 1,0 Å), sugerindo que a conformação "ativa"de PIK-75 pré-existe em solução, antes da formação do complexo receptor-ligante. Esses resultados sugerem que o reconhecimento molecular de PIK-75 pela enzima GSK-3β se dá por vias de seleção conformacional do ligante. Ainda, o uso da ferramenta ConfID permitiu a identificação de 2 populações conformacionais com abundâncias relativas de 46% e 54%, assinalando com sucesso 98% das conformações atingidas ao longo da trajetória e contabilizando 11540 eventos de transição conformacional entre as populações. As conformações das 2 populações conformacionais foram separadas e uma análise de RMSD usando como referência as estruturas cristalográficas complexada e do monocristal nos permitiu correlacionar conformacionalmente a população minoritária com a conformação do ligante complexado ao receptor. Em tese, um químico medicinal poderia propor uma modificação estrutural no Capítulo 6. Discussão geral 164 ligante PIK-75 de baixo impacto entálpico, porém aumentando a abundância relativa da conformação minoritária e tornando a conformação "ativa"mais disponível para o reconhecimento do receptor, o que poderia aumentar a potência do novo ligante em questão. Ainda, compreender a magnitude e as formas de interação do fármaco com o solvente pode fornecer uma base quantitativa para intuir o custo entálpico envolvido no processo de dessolvatação. Dessa forma, a compreensão da dinâmica conformacional de ligantes livres em solução e a devida caracterização de suas populações conformacionais podem fornecer dados que atuem no auxílio de pesquisadores no desenho racional de novos fármacos. Nossos resultados reacendem o debate sobre os modelos de reconhecimento molecular, como os do tipo "encaixe induzido"e "seleção conformacional". Mas, embora o caso relatado se atenha à um modelo, reflexões sobre o mecanismo pelo qual pequenas moléculas são reconhecidas pelo seu receptor-alvo ainda são válidas e necessárias. Existem na literatura científica diversos casos bem documentados que corroboram ambos os modelos [272–275], reforçando a ideia de que não são possíveis generalizações nesse sentido. Nesse sentido, acessar o ∆Scomplexação formação de complexos receptor-ligante através da caracterização estrutural de ligantes livres e complexados pode fornecer informações valiosas para a discriminação dos modelos de reconhecimento molecular e para o desenho de novos fármacos. Calcular a entropia de um sistema biológico forma acurada tem sido um desafio científico nas últimas décadas na área de modelagem molecular [276–281]. Em grande parte, as limitações estão associadas à capacidade dos parâmetros do campo de força reproduzir as configurações do soluto observadas experimentalmente devido aos inúmeros graus de liberdade relacionados [218, 282] e a limitada capacidade computacional de cobrir eventos de rara ocorrência (alta energia potencial) [218, 223, 226]. Até recentemente, simulações de eventos de formação de complexos receptorligante estavam longe do alcance do simulações clássicas de DM de átomos explícitos. Atualmente, supercomputadores especializados, tais como Anton criado no laboratório de David E. Shaw, são capazes de simular trajetórias de milissegundos de pequenas proteínas globulares [283, 284]. Ainda a crescente popularização de métodos de amostragem ampliada, tais como a metadinâmica, têm demonstrado a capacidade de varrer maiores porções da superfície energética relacionada à formação de complexos receptorligante [222–228]. Apesar dessa capacidade computacional e técnica ainda não ser de uso comum na comunidade científica, certamente novos avanços serão realizados em ambas as frentes nas próximas décadas, possibilitando uma amostragem conformaci- Capítulo 6. Discussão geral 165 Figura 14 – Esquema exemplificando a aplicação da ferramenta ConfID na caracterização conformacional de ligantes livres em solução ou ligados ao seu receptor-alvo. Círculos pontilhados designam conjunto conformacional. onal de complexos receptor-ligante e uma derivação mais acurada das propriedades termodinâmicas associadas. Por hora, a metodologia analítica proposta nessa tese tem um custo computacional limitado apenas à simulação da dinâmica molecular do ligante em si, uma vez que a posterior caracterização estrutural realizada por ConfID é quase instantânea. Por isso, tendo em vista os contantes avanços computacionais que ultrapassam os avanços de hardware previstos pela Lei de Moore [285], consideramos essa metodologia uma estratégia válida para a aplicação em pequena escala como uma abordagem racional de prospecção de potenciais interações ligante-receptor no desenvolvimento de fármacos. 7 Conclusões 166 O presente trabalho teve como objetivo estudar a flexibilidade de compostos bioativos livres em solução através de simulações de dinâmica molecular como meio de prospecção de possíveis interações fármaco-receptor. A partir dos resultados expostos, pode-se concluir que: I. Desenvolvimento de metodologias de parametrização sistemática de ligantes: i. O protocolo empregado para a geração de grupos de cargas atende às premissas de modularidade e transferabilidade usualmente aplicados no desenvolvimento de parâmetros de campos de força; ii. O protocolo empregado se mostrou competitivo na descrição de propriedades físico-químicas de líquidos orgânicos; iii. A caracterização das interações dos anéis aromáticos com o solvente permitiu a geração de um extenso volume de dados quantitativos que podem ser usados pela química medicinal no desenho de novos fármacos; II. Análise da amostragem conformacional de moléculas em solução de forma sistemática: i. A geração de novos perfis torsionais baseados em dados quânticos permitiu uma descrição conformacional acurada das moléculas estudadas; ii. A avaliação do impacto de substituições vizinhas às torções na barreira rotacional permitiu compreender melhor a dinâmica das moléculas estudadas; iii. A caracterização e quantificação das populações conformacionais simuladas fornece subsídios que podem ser utilizados para o aumento da acurácia de estudos de ancoragem molecular; iv. A metodologia de análise empregada nesse trabalho permitiu a criação de uma ferramenta para análise da amostragem conformacional de ligantes. III. Dinâmica de ligantes em solução e seu impacto no reconhecimento molecular: i. O emprego dos protocolos de derivação de cargas parciais e geração de novos termos torsionais permitiu o estudo detalhadado da dinâmica do PIK-75 em solução; Capítulo 7. Conclusões 167 ii. A ferramenta desenvolvida permitiu uma detalhada caracterização e quantificação das populações conformacionais do PIK-75 em solução, demonstrando seu potencial de aplicação para outros ligantes. iii. A caracterização e quantificação das populações conformacionais do PIK-75 em solução permitiu a inferência dos eventos de reconhecimento molecular do ligante pela GSK-3β. 8 Perspectivas 168 “Science never solves a problem without creating ten more.’ George Bernard Shaw O desenvolvimento deste trabalho abre perspectivas para a descrição conformacional de moléculas bioativas no campo da química medicinal e desenvolvimento de novos fármacos, tais como: • Expandir o número de padrões moleculares parametrizados dentro da filosofia GROMOS; • Implementar rotinas automatizadas de construção de topologias e validações; • Correlacionar dados conformacionais de ligantes com dados termodinâmicos de ligação ao receptor-alvo obtidos por ITC. • Avaliar sistematicamente a estrutura e configuração de solventes ao redor das diferentes populações conformacionais de ligantes em solução. Referências 169 1 NG, R. Drugs : from discovery to approval. 3rd editio. ed. Singapore: Wiley & Sons, Inc., 2015. ISBN 9781118907276. Citado 4 vezes nas páginas 18, 19, 20 e 21, 2 GUIDO, R. V. C.; ANDRICOPULO, A. D.; OLIVA, G. Planejamento de fármacos, biotecnologia e química medicinal: aplicações em doenças infecciosas. Estudos Avançados, Instituto de Estudos Avançados da Universidade de São Paulo, v. 24, n. 70, p. 81–98, 2010. ISSN 0103-4014. Citado 4 vezes nas páginas 18, 20, 21 e 22, 3 ANDRICOPULO, A. D. et al. Recent Trends in Structure-Based Drug Design and Energetics. In: Burger’s Medicinal Chemistry and Drug Discovery. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2010. Disponível em: . Citado 6 vezes nas páginas 18, 20, 22, 23, 24 e 33, 4 MEANWELL, N. A. Improving Drug Design: An Update on Recent Applications of Efficiency Metrics, Strategies for Replacing Problematic Elements, and Compounds in Nontraditional Drug Space. Chemical Research in Toxicology, v. 29, n. 4, p. 564–616, apr 2016. ISSN 0893-228X. Disponível em: . Citado na página 18, 5 WERMUTH, C. et al. Glossary of terms used in medicinal chemistry. IUPAC Recommendations, 1997. Citado na página 18, 6 SOLLANO, J. et al. The Economics of Drug Discovery and the Ultimate Valuation of Pharmacotherapies in the Marketplace. Clinical Pharmacology & Therapeutics, v. 84, n. 2, p. 263–266, aug 2008. ISSN 0009-9236. Disponível em: . Citado na página 18, 7 HUGHES, J. P. et al. Principles of early drug discovery. British journal of pharmacology, Wiley-Blackwell, v. 162, n. 6, p. 1239–49, mar 2011. ISSN 1476-5381. Disponível em: . Citado 2 vezes nas páginas 18 e 21, 8 SUZUKI, A. Quantitative Decision Making in Drug Development: Pharmacometrics. YAKUGAKU ZASSHI, v. 136, n. 4, p. 537–542, apr 2016. ISSN 0031-6903. Citado na página 18, 9 BOWES, J. et al. Reducing safety-related drug attrition: the use of in vitro pharmacological profiling. Nature Reviews Drug Discovery, v. 11, 2012. Citado na página 18, 10 WISE, L. et al. New approaches to drug safety: a pharmacovigilance tool kit. Nature Reviews Drug Discovery, Nature Publishing Group, v. 8, n. 10, p. 779, sep 2009. ISSN Referências 170 1474-1776. Disponível em: . Citado na página 18, 11 LOMBARDINO, J. G.; LOWE, J. A. A guide to drug discovery: The role of the medicinal chemist in drug discovery — then and now. Nature Reviews Drug Discovery, v. 3, n. 10, p. 853–862, oct 2004. ISSN 14741776. Disponível em: . Citado 2 vezes nas páginas 19 e 21, 12 AITKEN, M.; KLEINROCK, M.; NASS, D. Outlook for Global Medicines through 2021. [S.l.], 2016. 58 p. Disponível em: . Citado na página 19, 13 ANDERSON, A. C. The Process of Structure-Based Drug Design. Chemistry & Biology, Elsevier, v. 10, n. 9, p. 787–797, sep 2003. ISSN 10745521. Disponível em: . Citado 2 vezes nas páginas 20 e 22, 14 RANKOVIC, Z.; MORPHY, R. (Ed.). Lead Generation Approaches in Drug Discovery. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2010. ISBN 9780470584170. Disponível em: . Citado 5 vezes nas páginas 20, 21, 22, 23 e 24, 15 KESERÜ, G. M.; MAKARA, G. M. The influence of lead discovery strategies on the properties of drug candidates. Nature Reviews Drug Discovery, Nature Publishing Group, v. 8, n. 3, p. 203–212, mar 2009. ISSN 1474-1776. Disponível em: . Citado 2 vezes nas páginas 20 e 21, 16 PHRMA. Clinical Trials. Disponível em: . Citado na página 20, 17 AHERNE, G. W.; MCDONALD, E.; WORKMAN, P. Finding the needle in the haystack: why high-throughput screening is good for your health. Breast cancer research : BCR, BioMed Central, v. 4, n. 4, p. 148–54, 2002. ISSN 1465-5411. Disponível em: . Citado na página 21, 18 DEPREZ-POULAIN, R.; DEPREZ, B. Facts, figures and trends in lead generation. Current topics in medicinal chemistry, v. 4, n. 6, p. 569–80, 2004. ISSN 1568-0266. Disponível em: . Citado na página 21, 19 BOHACEK, R. S.; MCMARTIN, C.; GUIDA, W. C. The art and practice of structure-based drug design: A molecular modeling perspective. Medicinal Research Reviews, Wiley-Blackwell, v. 16, n. 1, p. 3–50, jan 1996. ISSN 0198-6325. Citado na página 21, Referências 171 20 SZYMAŃSKI, P.; MARKOWICZ, M.; MIKICIUK-OLASIK, E. Adaptation of high-throughput screening in drug discovery-toxicological screening tests. International journal of molecular sciences, Multidisciplinary Digital Publishing Institute (MDPI), v. 13, n. 1, p. 427–52, 2012. ISSN 1422-0067. Disponível em: . Citado na página 21, 21 BERMAN, H. M. et al. The Protein Data Bank. Nucleic acids research, v. 28, n. 1, p. 235–42, jan 2000. ISSN 0305-1048. Disponível em: . Citado 2 vezes nas páginas 22 e 39, 22 BERMAN, H. M. et al. Trendspotting in the Protein Data Bank. FEBS Letters, No longer published by Elsevier, v. 587, n. 8, p. 1036–1045, apr 2013. ISSN 0014-5793. Disponível em: . Citado na página 22, 23 GREBNER, C. et al. Exploring Binding Mechanisms in Nuclear Hormone Receptors by Monte Carlo and X-ray-derived Motions. Biophysical Journal, Cell Press, v. 112, n. 6, p. 1147–1156, mar 2017. ISSN 0006-3495. Disponível em: . Citado 2 vezes nas páginas 22 e 39, 24 BLEICHER, K. H. et al. Hit and lead generation: beyond high-throughput screening. Nature reviews. Drug discovery, v. 2, n. 5, p. 369–78, may 2003. ISSN 1474-1776. Disponível em: . Citado na página 22, 25 DAVIS, A.; STGALLAY, S.; KLEYWEGT, G. Limitations and lessons in the use of X-ray structural information in drug design. Drug Discovery Today, v. 13, n. 19-20, p. 831–841, oct 2008. ISSN 13596446. Disponível em: . Citado na página 22, 26 DAVIS, A. M.; TEAGUE, S. J.; KLEYWEGT, G. J. Application and Limitations of X-ray Crystallographic Data in Structure-Based Ligand and Drug Design. Angewandte Chemie International Edition, v. 42, n. 24, p. 2718–2736, jun 2003. ISSN 14337851. Disponível em: . Citado na página 22, 27 KITCHEN, D. B. et al. Docking and scoring in virtual screening for drug discovery: methods and applications. Nature Reviews Drug Discovery, v. 3, n. 11, p. 935–949, nov 2004. ISSN 1474-1776. Disponível em: . Citado 2 vezes nas páginas 22 e 23, 28 ALVAREZ, J. C. High-throughput docking as a source of novel drug leads. Current Opinion in Chemical Biology, v. 8, n. 4, p. 365–370, aug 2004. ISSN 13675931. Disponível em: . Citado na página 23, Referências 172 29 MENG, X.-Y. et al. Molecular docking: a powerful approach for structure-based drug discovery. Current computer-aided drug design, NIH Public Access, v. 7, n. 2, p. 146–57, jun 2011. ISSN 1875-6697. Disponível em: . Citado 2 vezes nas páginas 23 e 24, 30 KLEBE, G. Virtual ligand screening: strategies, perspectives and limitations. Drug discovery today, v. 11, n. 13-14, p. 580–94, jul 2006. ISSN 1359-6446. Disponível em: . Citado 2 vezes nas páginas 23 e 24, 31 LEACH, A. R.; SHOICHET, B. K.; PEISHOFF, C. E. Prediction of ProteinLigand Interactions. Docking and Scoring: Successes and Gaps. Journal of Medicinal Chemistry, American Chemical Society, v. 49, n. 20, p. 5851–5855, 2006. Disponível em: . Citado 2 vezes nas páginas 23 e 24, 32 JONES, G. et al. Development and validation of a genetic algorithm for flexible docking. Journal of Molecular Biology, Academic Press, v. 267, n. 3, p. 727–748, apr 1997. ISSN 0022-2836. Citado 2 vezes nas páginas 23 e 162, 33 FRIESNER, R. A. et al. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. Journal of Medicinal Chemistry, v. 47, n. 7, p. 1739–1749, mar 2004. ISSN 00222623. Disponível em: . Citado na página 23, 34 HALGREN, T. A. et al. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening. Journal of Medicinal Chemistry, v. 47, n. 7, p. 1750–1759, mar 2004. ISSN 00222623. Disponível em: . Citado na página 23, 35 RAREY, M. et al. A Fast Flexible Docking Method using an Incremental Construction Algorithm. Journal of Molecular Biology, v. 261, n. 3, p. 470–489, aug 1996. ISSN 00222836. Disponível em: . Citado na página 23, 36 MORRIS, G. M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry, John Wiley & Sons, Ltd, v. 19, n. 14, p. 1639–1662, nov 1998. ISSN 0192-8651. Citado na página 23, 37 TROTT, O.; OLSON, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, NIH Public Access, v. 31, n. 2, p. 455–61, jan 2010. ISSN 1096-987X. Disponível em: . Citado 2 vezes nas páginas 23 e 162, Referências 173 38 PAGADALA, N. S.; SYED, K.; TUSZYNSKI, J. Software for molecular docking: a review. Biophysical Reviews, v. 9, n. 2, p. 91–102, apr 2017. ISSN 1867-2450. Disponível em: . Citado 2 vezes nas páginas 23 e 24, 39 DUDEK, A. Z.; ARODZ, T.; GALVEZ, J. Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Combinatorial chemistry & high throughput screening, v. 9, n. 3, p. 213–28, mar 2006. ISSN 1386-2073. Disponível em: . Citado na página 24, 40 WINKLER, D. A. The role of quantitative structure–activity relationships (QSAR) in biomolecular discovery. Briefings in bioinformatics, v. 3, n. 1, p. 73–86, mar 2002. ISSN 1467-5463. Disponível em: . Citado na página 24, 41 GUHA, R. On exploring structure-activity relationships. Methods in molecular biology (Clifton, N.J.), NIH Public Access, v. 993, p. 81–94, 2013. ISSN 19406029. Disponível em: . Citado na página 24, 42 WIENER, H. Structural Determination of Paraffin Boiling Points. Journal of the American Chemical Society, American Chemical Society, v. 69, n. 1, p. 17–20, jan 1947. ISSN 0002-7863. Disponível em: . Citado na página 24, 43 HANSCH, C. et al. Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients. Nature, Nature Publishing Group, v. 194, n. 4824, p. 178–180, apr 1962. ISSN 0028-0836. Disponível em: . Citado na página 24, 44 HANSCH, C.; FUJITA, T. p -σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure. Journal of the American Chemical Society, American Chemical Society, v. 86, n. 8, p. 1616–1626, apr 1964. ISSN 0002-7863. Disponível em: . Citado na página 24, 45 CARHART, R. E.; SMITH, D. H.; VENKATARAGHAVAN, R. Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Modeling, v. 25, n. 2, p. 64–73, may 1985. ISSN 1549-9596. Disponível em: . Citado na página 24, 46 ROWLETT, R. J. Perspectives on editorial operations of Chemical Abstracts Service. Journal of Chemical Information and Modeling, v. 25, n. 2, p. 61–64, may 1985. ISSN 1549-9596. Disponível em: . Citado na página 24, Referências 174 47 WILLETT, P.; WINTERMAN, V.; BAWDEN, D. Implementation of nearestneighbor searching in an online chemical structure search system. Journal of Chemical Information and Modeling, v. 26, n. 1, p. 36–41, feb 1986. ISSN 1549-9596. Disponível em: . Citado na página 24, 48 ZVINAVASHE, E.; MURK, A. J.; RIETJENS, I. M. C. M. Promises and Pitfalls of Quantitative StructureActivity Relationship Approaches for Predicting Metabolism and Toxicity. Chemical Research in Toxicology, American Chemical Society, v. 21, n. 12, p. 2229–2236, dec 2008. ISSN 0893-228X. Disponível em: . Citado na página 24, 49 CRAMER, R. D.; PATTERSON, D. E.; BUNCE, J. D. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. Journal of the American Chemical Society, American Chemical Society, v. 110, n. 18, p. 5959–5967, aug 1988. ISSN 0002-7863. Disponível em: . Citado na página 24, 50 CRAMER, R. D.; PATTERSON, D. E.; BUNCE, J. D. Recent advances in comparative molecular field analysis (CoMFA). Progress in clinical and biological research, v. 291, p. 161–5, 1989. ISSN 0361-7742. Disponível em: . Citado na página 24, 51 KLEBE, G.; ABRAHAM, U.; MIETZNER, T. Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. Journal of medicinal chemistry, v. 37, n. 24, p. 4130–46, nov 1994. ISSN 0022-2623. Disponível em: . Citado na página 24, 52 KLEBE, G.; ABRAHAM, U. Comparative molecular similarity index analysis (CoMSIA) to study hydrogen-bonding properties and to score combinatorial libraries. Journal of computer-aided molecular design, v. 13, n. 1, p. 1–10, jan 1999. ISSN 0920-654X. Disponível em: . Citado na página 24, 53 CASTILHO, M. S. et al. Two- and three-dimensional quantitative structure-activity relationships for a series of purine nucleoside phosphorylase inhibitors. Bioorganic & medicinal chemistry, v. 14, n. 2, p. 516–27, jan 2006. ISSN 0968-0896. Disponível em: . Citado na página 24, 54 SIPPL, W. Development of biologically active compounds by combining 3D QSAR and structure-based design methods. Journal of Computer-Aided Molecular Design, Kluwer Academic Publishers, v. 16, n. 11, p. 825–830, 2002. ISSN 0920654X. Disponível em: . Citado na página 24, 55 SALUM, L. B.; POLIKARPOV, I.; ANDRICOPULO, A. D. Structure-Based Approach for the Study of Estrogen Receptor Binding Affinity and Subtype Selectivity. Journal of Chemical Information and Modeling, American Chemical Society, v. 48, n. 11, p. 2243–2253, nov 2008. ISSN 1549-9596. Disponível em: . Citado na página 24, Referências 175 56 ATKINS, P. W. P. W.; De Paula, J.; KEELER, J. Atkins’ Physical chemistry. [S.l.: s.n.], 2018. 908 p. ISBN 9780198769866. Citado 2 vezes nas páginas 25 e 26, 57 SILBERBERG, M. S. M. S. Principles of general chemistry. [S.l.]: McGraw-Hill, 2013. ISBN 0073402699. Citado 2 vezes nas páginas 25 e 26, 58 LEHNINGER, A.; NELSON, D. L.; COX, M. M. Lehninger Principles of Biochemistry. Fifth edition. [S.l.]: W. H. Freeman, 2008. Hardcover. Citado na página 25, 59 VOET, D.; VOET, J. G. Bioquimica. [S.l.]: Artmed, 2013. ISBN 8582710046. Citado na página 25, 60 KARPLUS, M. et al. Protein dynamics: From the native to the unfolded state and back again. Molecular Engineering, Kluwer Academic Publishers, v. 5, n. 1-3, p. 55–70, mar 1995. ISSN 0925-5125. Disponível em: . Citado na página 25, 61 KARPLUS, M.; WEAVER, D. L. Protein-folding dynamics. Nature, v. 260, n. 5550, p. 404–406, apr 1976. ISSN 0028-0836. Disponível em: . Citado na página 25, 62 LAAGE, D.; ELSAESSER, T.; HYNES, J. T. Water Dynamics in the Hydration Shells of Biomolecules. Chemical Reviews, American Chemical Society, v. 117, n. 16, p. 10694–10725, aug 2017. ISSN 0009-2665. Disponível em: . Citado na página 25, 63 LEVINTHAL, C. How to Fold Graciously. In: DEBRUNNDER, J. T. P.; MUNCK, E. (Ed.). Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. [S.l.]: University of Illinois Press, 1969. p. 22–24. Citado na página 26, 64 BOWMAN, G. R.; VOELZ, V. A.; PANDE, V. S. Taming the complexity of protein folding. Current opinion in structural biology, NIH Public Access, v. 21, n. 1, p. 4–11, feb 2011. ISSN 1879-033X. Disponível em: . Citado na página 26, 65 NAGANATHAN, A. N.; MUÑOZ, V. Scaling of Folding Times with Protein Size. Journal of the American Chemical Society, American Chemical Society, v. 127, n. 2, p. 480–481, 2005. Disponível em: . Citado na página 26, 66 BRYNGELSON, J. D.; WOLYNES, P. G. Spin glasses and the statistical mechanics of protein folding. Proceedings of the National Academy of Sciences of the United States of America, National Academy of Sciences, v. 84, n. 21, p. 7524–8, nov 1987. ISSN 0027-8424. Disponível em: . Citado na página 26, Referências 176 67 BRYNGELSON, J. D.; WOLYNES, P. G. Intermediates and barrier crossing in a random energy model (with applications to protein folding). The Journal of Physical Chemistry, American Chemical Society, v. 93, n. 19, p. 6902–6915, sep 1989. ISSN 0022-3654. Disponível em: . Citado na página 26, 68 GO, N. Theoretical Studies of Protein Folding. Annual Review of Biophysics and Bioengineering, v. 12, n. 1, p. 183–210, jun 1983. ISSN 0084-6589. Disponível em: . Citado na página 26, 69 ABE, H.; GO, N. Noninteracting local-structure model of folding and unfolding transition in globular proteins. II. Application to two-dimensional lattice proteins. Biopolymers, Wiley-Blackwell, v. 20, n. 5, p. 1013–1031, may 1981. ISSN 0006-3525. Disponível em: . Citado na página 26, 70 ONUCHIC, J. N. et al. Protein folding funnels: the nature of the transition state ensemble. Folding and Design, Cell Press, v. 1, n. 6, p. 441–450, dec 1996. ISSN 1359-0278. Disponível em: . Citado na página 26, 71 LEOPOLD, P. E.; MONTAL, M.; ONUCHIC, J. N. Protein folding funnels: a kinetic approach to the sequence-structure relationship. Proceedings of the National Academy of Sciences of the United States of America, National Academy of Sciences, v. 89, n. 18, p. 8721–5, sep 1992. ISSN 0027-8424. Disponível em: . Citado na página 26, 72 MÜLLER, D. J.; WU, N.; PALCZEWSKI, K. Vertebrate membrane proteins: structure, function, and insights from biophysical approaches. Pharmacological reviews, American Society for Pharmacology and Experimental Therapeutics, v. 60, n. 1, p. 43– 78, mar 2008. ISSN 1521-0081. Disponível em: . Citado na página 27, 73 KARPLUS, M. Dynamical aspects of molecular recognition. Journal of Molecular Recognition, John Wiley & Sons, Ltd, v. 23, n. 2, p. 102–104, mar 2010. ISSN 09523499. Disponível em: . Citado na página 26, 74 DOBBS, K. D.; HEHRE, W. J. Molecular orbital theory of the properties of inorganic and organometallic compounds 4. Extended basis sets for thirdand fourth-row, main-group elements. J. Comput. Chem., John Wiley & Sons, Inc., v. 7, n. 3, p. 359–378, jun 1986. ISSN 0192-8651. Disponível em: . Citado 2 vezes nas páginas 26 e 47, 75 DINNER, A. R. et al. Understanding protein folding via free-energy surfaces from theory and experiment. Trends in biochemical sciences, v. 25, n. 7, p. 331–9, jul 2000. ISSN 0968-0004. Disponível em: . Citado na página 26, Referências 177 76 GÖBL, C. et al. Application of Solution NMR Spectroscopy to Study Protein Dynamics. Entropy, Molecular Diversity Preservation International, v. 14, n. 3, p. 581–598, mar 2012. ISSN 1099-4300. Disponível em: . Citado 2 vezes nas páginas 26 e 28, 77 BYCROFT, M. et al. Detection and characterization of a folding intermediate in barnase by NMR. Nature, Nature Publishing Group, v. 346, n. 6283, p. 488–490, aug 1990. ISSN 0028-0836. Disponível em: . Citado na página 26, 78 CALLIES, O.; Hernández Daranas, A. Application of isothermal titration calorimetry as a tool to study natural product interactions. Natural Product Reports, The Royal Society of Chemistry, v. 33, n. 7, p. 881–904, jun 2016. ISSN 0265-0568. Disponível em: . Citado 2 vezes nas páginas 28 e 29, 79 FALCONER, R. J. Applications of isothermal titration calorimetry - the research and technical developments from 2011 to 2015. Journal of Molecular Recognition, v. 29, n. 10, p. 504–515, oct 2016. ISSN 09523499. Disponível em: . Citado 2 vezes nas páginas 28 e 29, 80 AZEVEDO, W. F. de; DIAS, R. Experimental approaches to evaluate the thermodynamics of protein-drug interactions. Current drug targets, v. 9, n. 12, p. 1071–6, dec 2008. ISSN 1873-5592. Disponível em: . Citado na página 28, 81 CELEJ, M. S. et al. Differential scanning calorimetry as a tool to estimate binding parameters in multiligand binding proteins. Analytical Biochemistry, v. 350, n. 2, p. 277–284, mar 2006. ISSN 00032697. Disponível em: . Citado na página 28, 82 CHIU, M. H.; PRENNER, E. J. Differential scanning calorimetry: An invaluable tool for a detailed thermodynamic characterization of macromolecules and their interactions. Journal of pharmacy & bioallied sciences, Wolters Kluwer – Medknow Publications, v. 3, n. 1, p. 39–59, jan 2011. ISSN 09757406. Disponível em: . Citado na página 28, 83 HANSEN, L. D.; CHRISTENSEN, J. J.; IZATT, R. M. Entropy titration. A calorimetric method for the determination of ∆G(K), ∆H and ∆S. Chem. Commun. (London), The Royal Society of Chemistry, v. 0, n. 3, p. 36–38, jan 1965. ISSN 0009-241X. Disponível em: . Citado na página 28, 84 CHRISTENSEN, J. J. et al. Entropy Titration. A Calorimetric Method for the Determination of ∆G, ∆H, and ∆S from a Single Thermometric Titration. The Journal of Physical Chemistry, American Chemical Society, v. 70, n. 6, p. 2003–2010, jun 1966. Referências 178 Disponível em: . Citado na página 28, 85 BILTONEN, R. L.; LANGERMAN, N. Microcalorimetry for biological chemistry: experimental design, data analysis, and interpretation. Methods in enzymology, v. 61, p. 287–318, 1979. ISSN 0076-6879. Disponível em: . Citado na página 28, 86 WISEMAN, T. et al. Rapid measurement of binding constants and heats of binding using a new titration calorimeter. Analytical biochemistry, v. 179, n. 1, p. 131–7, may 1989. ISSN 0003-2697. Disponível em: . Citado na página 28, 87 FREIRE, E.; MAYORGA, O. L.; STRAUME, M. Isothermal Titration Calorimetry. Analytical Chemistry, American Chemical Society, v. 62, n. 18, p. 950A–959A, sep 1990. ISSN 0003-2700. Disponível em: . Citado na página 28, 88 CHO, S. et al. Assessing Energetic Contributions to Binding from a Disordered Region in a ProteinProtein Interaction,. Biochemistry, American Chemical Society, v. 49, n. 43, p. 9256–9268, nov 2010. ISSN 0006-2960. Disponível em: . Citado na página 29, 89 LADBURY, J. E.; KLEBE, G.; FREIRE, E. Adding calorimetric data to decision making in lead discovery: a hot tip. Nature Reviews Drug Discovery, Nature Publishing Group, v. 9, n. 1, p. 23–27, jan 2010. ISSN 1474-1776. Disponível em: . Citado na página 29, 90 FREYER, M. W.; LEWIS, E. A. Isothermal Titration Calorimetry: Experimental Design, Data Analysis, and Probing Macromolecule/Ligand Binding and Kinetic Interactions. Methods in Cell Biology, Academic Press, v. 84, p. 79–113, jan 2008. ISSN 0091-679X. Citado na página 29, 91 LEAVITT, S.; FREIRE, E. Direct measurement of protein binding energetics by isothermal titration calorimetry. Current Opinion in Structural Biology, Elsevier Current Trends, v. 11, n. 5, p. 560–566, sep 2001. ISSN 0959-440X. Citado na página 29, 92 RAJARATHNAM, K.; RÖSGEN, J. Isothermal titration calorimetry of membrane proteins — Progress and challenges. Biochimica et Biophysica Acta (BBA) Biomembranes, Elsevier, v. 1838, n. 1, p. 69–77, jan 2014. ISSN 0005-2736. Disponível em: . Citado na página 29, 93 FEIG, A. L. Applications of isothermal titration calorimetry in RNA biochemistry and biophysics. Biopolymers, John Wiley & Sons, Ltd, v. 87, n. 5-6, p. 293–301, dec 2007. ISSN 00063525. Disponível em: . Citado na página 29, Referências 179 94 SALIM, N. N.; FEIG, A. L. Isothermal titration calorimetry of RNA. Methods, Academic Press, v. 47, n. 3, p. 198–205, mar 2009. ISSN 1046-2023. Citado na página 29, 95 BROWN, A.; BROWN; ALAN. Analysis of Cooperativity by Isothermal Titration Calorimetry. International Journal of Molecular Sciences, Molecular Diversity Preservation International, v. 10, n. 8, p. 3457–3477, aug 2009. ISSN 1422-0067. Disponível em: . Citado na página 29, 96 STEUBER, H. et al. Tracing Changes in Protonation: A Prerequisite to Factorize Thermodynamic Data of Inhibitor Binding to Aldose Reductase. Journal of Molecular Biology, Academic Press, v. 373, n. 5, p. 1305–1320, nov 2007. ISSN 0022-2836. Citado na página 29, 97 CZODROWSKI, P.; SOTRIFFER, C. A.; KLEBE, G. Protonation Changes upon Ligand Binding to Trypsin and Thrombin: Structural Interpretation Based on pKa Calculations and ITC Experiments. Journal of Molecular Biology, Academic Press, v. 367, n. 5, p. 1347–1356, apr 2007. ISSN 0022-2836. Citado na página 29, 98 JIN, L. et al. Ca2+ and Mg2+ bind tetracycline with distinct stoichiometries and linked deprotonation. Biophysical Chemistry, Elsevier, v. 128, n. 2-3, p. 185–196, jul 2007. ISSN 0301-4622. Citado na página 29, 99 NILSSON, M. et al. Thermodynamic and Kinetic Characterization of HostGuest Association between Bolaform Surfactants and α- and βCyclodextrins. The Journal of Physical Chemistry B, American Chemical Society, v. 112, n. 36, p. 11310–11316, sep 2008. ISSN 1520-6106. Disponível em: . Citado na página 29, 100 BURNOUF, D. et al. kinITC: A New Method for Obtaining Joint Thermodynamic and Kinetic Data by Isothermal Titration Calorimetry. Journal of the American Chemical Society, American Chemical Society, v. 134, n. 1, p. 559–565, jan 2012. ISSN 0002-7863. Disponível em: . Citado na página 29, 101 Vander Meulen, K. A.; BUTCHER, S. E. Characterization of the kinetic and thermodynamic landscape of RNA folding using a novel application of isothermal titration calorimetry. Nucleic Acids Research, Oxford University Press, v. 40, n. 5, p. 2140–2151, mar 2012. ISSN 1362-4962. Disponível em: . Citado na página 29, 102 Di Trani, J. M. et al. Rapid measurement of inhibitor binding kinetics by isothermal titration calorimetry. Nature Communications, Nature Publishing Group, v. 9, n. 1, p. 893, dec 2018. ISSN 2041-1723. Disponível em: . Citado na página 29, 103 PAN, A. C. et al. Molecular determinants of drug–receptor binding kinetics. Drug Discovery Today, v. 18, n. 13-14, p. 667–673, jul 2013. ISSN 13596446. Disponível em: . Citado na página 29, Referências 180 104 TANG, Z.; ROBERTS, C. C.; CHANG, C.-E. A. Understanding ligandreceptor non-covalent binding kinetics using molecular modeling. Frontiers in bioscience (Landmark edition), NIH Public Access, v. 22, p. 960–981, 2017. ISSN 1093-4715. Disponível em: . Citado na página 29, 105 DU, X. et al. Insights into Protein-Ligand Interactions: Mechanisms, Models, and Methods. International journal of molecular sciences, Multidisciplinary Digital Publishing Institute (MDPI), v. 17, n. 2, jan 2016. ISSN 14220067. Disponível em: . Citado 2 vezes nas páginas 29 e 30, 106 SCHMIDTKE, P. et al. Shielded Hydrogen Bonds as Structural Determinants of Binding Kinetics: Application in Drug Design. Journal of the American Chemical Society, American Chemical Society, v. 133, n. 46, p. 18903–18910, nov 2011. ISSN 0002-7863. Disponível em: . Citado na página 29, 107 SETNY, P. et al. Dewetting-Controlled Binding of Ligands to Hydrophobic Pockets. Physical Review Letters, v. 103, n. 18, p. 187801, oct 2009. ISSN 0031-9007. Disponível em: . Citado na página 29, 108 SETNY, P. et al. Solvent fluctuations in hydrophobic cavity-ligand binding kinetics. Proceedings of the National Academy of Sciences of the United States of America, National Academy of Sciences, v. 110, n. 4, p. 1197–202, jan 2013. ISSN 1091-6490. Disponível em: . Citado na página 29, 109 UCHIDA, T.; ISHIMORI, K.; MORISHIMA, I. The effects of heme pocket hydrophobicity on the ligand binding dynamics in myoglobin as studied with leucine 29 mutants. The Journal of biological chemistry, American Society for Biochemistry and Molecular Biology, v. 272, n. 48, p. 30108–14, nov 1997. ISSN 0021-9258. Disponível em: . Citado na página 29, 110 LIU, L. et al. Evidence that Water Can Reduce the Kinetic Stability of ProteinHydrophobic Ligand Interactions. Journal of the American Chemical Society, American Chemical Society, v. 132, n. 50, p. 17658–17660, dec 2010. ISSN 0002-7863. Disponível em: . Citado na página 29, 111 KASTRITIS, P. L.; BONVIN, A. M. J. J. On the binding affinity of macromolecular interactions: daring to ask why proteins interact. Journal of the Royal Society, Interface, The Royal Society, v. 10, n. 79, p. 20120835, feb 2013. ISSN 1742-5662. Disponível em: . Citado na página 30, 112 DUNITZ, J. D. Win some, lose some: enthalpy-entropy compensation in weak intermolecular interactions. Chemistry & biology, v. 2, n. 11, p. 709–12, nov 1995. ISSN 1074-5521. Disponível em: . Citado na página 30, 113 ZHAO, H.; DIETRICH, J. Privileged scaffolds in lead generation. Expert Opinion on Drug Discovery, Informa Healthcare, v. 10, n. 7, p. 781–790, jul 2015. ISSN 1746-0441. Disponível em: . Citado 2 vezes nas páginas 30 e 31, 114 YU, W.; MACKERELL, A. D. Computer-Aided Drug Design Methods. In: Methods in molecular biology (Clifton, N.J.). [S.l.: s.n.], 2017. v. 1520, p. 85–106. Citado 2 vezes nas páginas 30 e 31, 115 GUIDO, R. V. C.; OLIVA, G.; ANDRICOPULO, A. D. Structure- and ligand-based drug design approaches for neglected tropical diseases. Pure and Applied Chemistry, De Gruyter, v. 84, n. 9, p. 1857–1866, may 2012. ISSN 1365-3075. Citado 2 vezes nas páginas 30 e 31, 116 SHULTZ, M. D. Two Decades under the Influence of the Rule of Five and the Changing Properties of Approved Oral Drugs. Journal of Medicinal Chemistry, American Chemical Society, p. acs.jmedchem.8b00686, sep 2018. ISSN 0022-2623. Disponível em: . Citado 2 vezes nas páginas 30 e 31, 117 PEROZZO, R.; FOLKERS, G.; SCAPOZZA, L. Thermodynamics of proteinligand interactions: history, presence, and future aspects. Journal of receptor and signal transduction research, v. 24, n. 1-2, p. 1–52, feb 2004. ISSN 1079-9893. Disponível em: . Citado na página 30, 118 FREIRE, E. A thermodynamic approach to the affinity optimization of drug candidates. Chem. Biol. Drug Des., v. 74, n. 5, p. 468–472, nov 2009. ISSN 17470277. Disponível em: . Citado 2 vezes nas páginas 31 e 41, 119 BLUNDELL, C. D.; NOWAK, T.; WATSON, M. J. Measurement, Interpretation and Use of Free Ligand Solution Conformations in Drug Discovery. In: Progress in Medicinal Chemistry. Elsevier, 2016. v. 55, cap. 2, p. 45–147. ISBN 9780444637154. Disponível em: . Citado 2 vezes nas páginas 31 e 41, 120 ZHANG, H. et al. Quantification of Solvent Contribution to the Stability of Noncovalent Complexes. Journal of Chemical Theory and Computation, American Chemical Society, v. 9, n. 10, p. 4542–4551, oct 2013. ISSN 1549-9618. Disponível em: . Citado na página 31, Referências 182 121 CLAVERIA-GIMENO, R. et al. A look at ligand binding thermodynamics in drug discovery. Expert Opinion on Drug Discovery, Taylor & Francis, v. 12, n. 4, p. 363–377, apr 2017. ISSN 1746-0441. Disponível em: . Citado na página 31, 122 GILSON, M. K. et al. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Research, Narnia, v. 44, n. D1, p. D1045–D1053, jan 2016. ISSN 0305-1048. Disponível em: . Citado na página 31, 123 WILLIAMS, G. et al. Binding thermodynamics discriminates fragments from druglike compounds: a thermodynamic description of fragment-based drug discovery. Drug Discovery Today, Elsevier Current Trends, v. 22, n. 4, p. 681–689, apr 2017. ISSN 1359-6446. Disponível em: . Citado na página 31, 124 TARCSAY, Á.; KESERű, G. M. Is there a link between selectivity and binding thermodynamics profiles? Drug Discovery Today, Elsevier Current Trends, v. 20, n. 1, p. 86–94, jan 2015. ISSN 1359-6446. Disponível em: . Citado na página 31, 125 REYNOLDS, C. H.; HOLLOWAY, M. K. Thermodynamics of ligand binding and efficiency. ACS Med. Chem. Lett., American Chemical Society, v. 2, n. 6, p. 433–437, jun 2011. ISSN 19485875. Disponível em: . Citado na página 31, 126 HANN, M. M. Molecular obesity, potency and other addictions in drug discovery. MedChemComm, The Royal Society of Chemistry, v. 2, n. 5, p. 349, may 2011. ISSN 2040-2503. Disponível em: . Citado na página 31, 127 Russo Krauss, I. et al. An Overview of Biological Macromolecule Crystallization. International Journal of Molecular Sciences, Multidisciplinary Digital Publishing Institute, v. 14, n. 6, p. 11643–11691, may 2013. ISSN 1422-0067. Disponível em: . Citado na página 31, 128 WLODAWER, A. et al. Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures. The FEBS journal, NIH Public Access, v. 275, n. 1, p. 1–21, jan 2008. ISSN 1742-464X. Disponível em: . Citado na página 31, 129 MALUF, F. V. et al. Cristalografia. In: Bioinformática: da Biologia à Flexibilidade Moleculares. 1a. ed. [S.l.: s.n.], 2014. cap. 13, p. 282. ISBN 978-85-69288-00-8. Citado 2 vezes nas páginas 32 e 33, Referências 183 130 DRENTH, J. Principles of Protein X-Ray Crystallography. 3. ed. New York, NY: Springer New York, 2007. ISBN 978-0-387-33334-2. Disponível em: . Citado na página 32, 131 WLODAWER, A.; DAUTER, Z.; JASKOLSKI, M. (Ed.). Protein Crystallography. New York, NY: Springer New York, 2017. v. 1607. (Methods in Molecular Biology, v. 1607). ISBN 978-1-4939-6998-2. Disponível em: . Citado na página 32, 132 SMART, O. S. et al. Validation of ligands in macromolecular structures determined by X-ray crystallography. Acta crystallographica. Section D, Structural biology, International Union of Crystallography, v. 74, n. Pt 3, p. 228–236, 2018. ISSN 2059-7983. Citado 2 vezes nas páginas 32 e 33, 133 MCNAE, I. W. et al. Studying protein–ligand interactions using protein crystallography. Crystallography Reviews, Taylor & Francis Group, v. 11, n. 1, p. 61–71, jan 2005. ISSN 0889-311X. Disponível em: . Citado na página 32, 134 DAVIES, D. R. Screening Ligands by X-ray Crystallography. In: Methods in molecular biology (Clifton, N.J.). [s.n.], 2014. v. 1140, p. 315–323. Disponível em: . Citado 2 vezes nas páginas 32 e 33, 135 LIEBESCHUETZ, J. et al. The good, the bad and the twisted: a survey of ligand geometry in protein crystal structures. Journal of Computer-Aided Molecular Design, v. 26, n. 2, p. 169–183, feb 2012. ISSN 0920-654X. Citado na página 32, 136 REYNOLDS, C. H. Protein–Ligand Cocrystal Structures: We Can Do Better. ACS Medicinal Chemistry Letters, v. 5, n. 7, p. 727–729, jul 2014. ISSN 1948-5875. Citado na página 32, 137 LONG, F. et al. AceDRG: a stereochemical description generator for ligands. Acta Crystallographica Section D Structural Biology, v. 73, n. 2, p. 112–122, feb 2017. ISSN 2059-7983. Citado na página 33, 138 DROR, R. O. et al. Biomolecular simulation: a computational microscope for molecular biology. Annual review of biophysics, v. 41, p. 429–52, 2012. ISSN 1936-1238. Disponível em: . Citado 2 vezes nas páginas 33 e 34, 139 ALMEIDA, M. d. S. Ressonância Magnética Nuclear. In: Bioinformática: da Biologia à Flexibilidade Moleculares. 1a. ed. [S.l.: s.n.], 2014. cap. 12, p. 282. ISBN 978-85-69288-00-8. Citado 3 vezes nas páginas 33, 160 e 161, 140 KWAN, E. E.; HUANG, S. G. Structural Elucidation with NMR Spectroscopy: Practical Strategies for Organic Chemists. European Journal of Organic Chemistry, v. 2008, n. 16, p. 2671–2688, jun 2008. ISSN 1434193X. Disponível em: . Citado na página 33, Referências 184 141 BECKER, W. et al. Investigating Protein-Ligand Interactions by Solution Nuclear Magnetic Resonance Spectroscopy. Chemphyschem : a European journal of chemical physics and physical chemistry, Wiley-Blackwell, v. 19, n. 8, p. 895–906, 2018. ISSN 1439-7641. Disponível em: . Citado 3 vezes nas páginas 33, 34 e 35, 142 CALA, O.; GUILLIÈRE, F.; KRIMM, I. NMR-based analysis of protein–ligand interactions. Analytical and Bioanalytical Chemistry, v. 406, n. 4, p. 943–956, feb 2014. ISSN 1618-2642. Disponível em: . Citado 2 vezes nas páginas 34 e 35, 143 ARANTES, P. R. et al. Development of GROMOS-Compatible Parameter Set for Simulations of Chalcones and Flavonoids. The Journal of Physical Chemistry B, American Chemical Society, p. acs.jpcb.8b10139, jan 2019. ISSN 1520-6106. Disponível em: . Citado 5 vezes nas páginas 34, 54, 159, 160 e 161, 144 SCHIEBEL, J. et al. Intriguing role of water in protein-ligand binding studied by neutron crystallography on trypsin complexes. Nature Communications, Nature Publishing Group, v. 9, n. 1, p. 3559, dec 2018. ISSN 2041-1723. Disponível em: . Citado na página 34, 145 CRAIK, D. J. et al. NMR and Drug Discovery. In: Burger’s Medicinal Chemistry and Drug Discovery. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2003. Disponível em: . Citado na página 35, 146 MEYER, B.; PETERS, T. NMR Spectroscopy Techniques for Screening and Identifying Ligand Binding to Protein Receptors. Angewandte Chemie International Edition, v. 42, n. 8, p. 864–890, feb 2003. ISSN 14337851. Disponível em: . Citado na página 35, 147 KOVACS, H.; MOSKAU, D.; SPRAUL, M. Cryogenically cooled probes—a leap in NMR technology. Citado 3 vezes nas páginas 35, 160 e 161, 148 KOVACS, H.; MOSKAU, D. Cryogenic NMR Probes. In: Encyclopedia of Biophysics. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. p. 392–396. Disponível em: . Citado 3 vezes nas páginas 35, 160 e 161, 149 DAUBER-OSGUTHORPE, P.; HAGLER, A. T. Biomolecular force fields: where have we been, where are we now, where do we need to go and how do we get there? Journal of Computer-Aided Molecular Design, Springer International Publishing, p. 1–71, nov 2018. ISSN 0920-654X. Disponível em: . Citado 4 vezes nas páginas 35, 36, 37 e 158, 150 BERNAL, J. D.; FOWLER, R. H. A Theory of Water and Ionic Solution, with Particular Reference to Hydrogen and Hydroxyl Ions. The Journal of Chemical Physics, Referências 185 American Institute of Physics, v. 1, n. 8, p. 515–548, aug 1933. ISSN 0021-9606. Disponível em: . Citado na página 35, 151 BARTON, D. H. R. Interactions between non-bonded atoms, and the structure of cis-decalin. Journal of the Chemical Society (Resumed), The Royal Society of Chemistry, v. 0, n. 0, p. 340, jan 1948. ISSN 0368-1769. Disponível em: . Citado na página 35, 152 MASON, E. A.; KREEVOY, M. M. A Simple Model for Barriers to Internal Rotation. Journal of the American Chemical Society, American Chemical Society, v. 77, n. 22, p. 5808–5814, nov 1955. ISSN 0002-7863. Disponível em: . Citado na página 35, 153 PITZER, K. S.; DONATH, W. E. Conformations and Strain Energy of Cyclopentane and its Derivatives. Journal of the American Chemical Society, American Chemical Society, v. 81, n. 13, p. 3213–3218, jul 1959. ISSN 0002-7863. Disponível em: . Citado na página 35, 154 HENDRICKSON, J. B. Molecular Geometry. I. Machine Computation of the Common Rings. Journal of the American Chemical Society, American Chemical Society, v. 83, n. 22, p. 4537–4547, nov 1961. ISSN 0002-7863. Disponível em: . Citado na página 35, 155 WEINER, S. J. et al. An all atom force field for simulations of proteins and nucleic acids. Journal of Computational Chemistry, John Wiley & Sons, Ltd, v. 7, n. 2, p. 230–252, apr 1986. ISSN 01928651. Disponível em: . Citado na página 35, 156 CORNELL, W. D. et al. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. Journal of the American Chemical Society, American Chemical Society, v. 117, n. 19, p. 5179–5197, may 1995. ISSN 0002-7863. Disponível em: . Citado 3 vezes nas páginas 35, 38 e 40, 157 JORGENSEN, W. L.; TIRADO-RIVES, J. The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. Journal of the American Chemical Society, American Chemical Society, v. 110, n. 6, p. 1657–1666, mar 1988. ISSN 0002-7863. Disponível em: . Citado 3 vezes nas páginas 35, 40 e 157, 158 LINDORFF-LARSEN, K. et al. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins, Wiley-Blackwell, v. 78, n. 8, p. 1950–8, jun 2010. ISSN 1097-0134. Disponível em: . Citado 2 vezes nas páginas 35 e 38, 159 MACKERELL, A. D. et al. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. The Journal of Physical Chemistry B, American Referências 186 Chemical Society, v. 102, n. 18, p. 3586–3616, apr 1998. ISSN 1520-6106. Disponível em: . Citado 3 vezes nas páginas 35, 38 e 40, 160 VANOMMESLAEGHE, K. et al. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. Journal of computational chemistry, NIH Public Access, v. 31, n. 4, p. 671–90, mar 2010. ISSN 1096-987X. Disponível em: . Citado 3 vezes nas páginas 35, 37 e 156, 161 JORGENSEN, W. L.; MAXWELL, D. S.; TIRADO-RIVES, J. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. Journal of the American Chemical Society, American Chemical Society, v. 118, n. 45, p. 11225–11236, jan 1996. ISSN 0002-7863. Disponível em: . Citado 2 vezes nas páginas 35 e 38, 162 HARDER, E. et al. OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. Journal of Chemical Theory and Computation, American Chemical Society, v. 12, n. 1, p. 281–296, jan 2016. ISSN 1549-9618. Disponível em: . Citado 2 vezes nas páginas 35 e 37, 163 GUNSTEREN, W. F. van et al. Biomolecular Simulation: The GROMOS96 Manual and User Guide. Vdf Hochschulverlag AG an der ETH, p. 1–1042, 1996. Citado 4 vezes nas páginas 35, 38, 39 e 40, 164 SOARES, T. A. et al. An improved nucleic acid parameter set for the GROMOS force field. Journal of Computational Chemistry, Wiley Subscription Services, Inc., A Wiley Company, v. 26, n. 7, p. 725–737, may 2005. ISSN 01928651. Disponível em: . Citado 2 vezes nas páginas 35 e 38, 165 OOSTENBRINK, C. et al. A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5 and 53A6. Journal of computational chemistry, v. 25, n. 13, p. 1656–76, oct 2004. ISSN 0192-8651. Disponível em: . Citado 7 vezes nas páginas 35, 38, 39, 45, 46, 52 e 157, 166 SCHMID, N. et al. Definition and testing of the GROMOS force-field versions 54A7 and 54B7. European biophysics journal : EBJ, v. 40, n. 7, p. 843–56, jul 2011. ISSN 1432-1017. Disponível em: . Citado 4 vezes nas páginas 35, 38, 39 e 156, 167 LEACH, A. R. Molecular modelling : principles and applications. [S.l.]: Longman, 1996. 595 p. ISBN 9780582239333. Citado 3 vezes nas páginas 36, 40 e 41, 168 CALEMAN, C. et al. Force Field Benchmark of Organic Liquids: Density, Enthalpy of Vaporization, Heat Capacities, Surface Tension, Isothermal Compressibility, Volumetric Expansion Coefficient, and Dielectric Constant. Referências 187 Journal of chemical theory and computation, v. 8, n. 1, p. 61–74, jan 2012. ISSN 1549-9626. Disponível em: . Citado 5 vezes nas páginas 36, 38, 48, 56 e 158, 169 RINIKER, S. Fixed-Charge Atomistic Force Fields for Molecular Dynamics Simulations in the Condensed Phase: An Overview. Journal of Chemical Information and Modeling, American Chemical Society, v. 58, n. 3, p. 565–578, mar 2018. ISSN 1549-9596. Disponível em: . Citado 3 vezes nas páginas 36, 39 e 41, 170 RAPPE, A. K. et al. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American Chemical Society, American Chemical Society, v. 114, n. 25, p. 10024–10035, dec 1992. ISSN 0002-7863. Disponível em: . Citado na página 36, 171 HALGREN, T. A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. Journal of Computational Chemistry, John Wiley & Sons, Ltd, v. 17, n. 5-6, p. 490–519, apr 1996. ISSN 01928651. Citado na página 36, 172 WANG, J. M. et al. Development and testing of a general amber force field. J. Comput. Chem., v. 25, n. 9, p. 1157–1174, 2004. ISSN 0192-8651. Citado 2 vezes nas páginas 37 e 156, 173 ROOS, K. et al. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. Journal of Chemical Theory and Computation, American Chemical Society, p. acs.jctc.8b01026, feb 2019. ISSN 1549-9618. Disponível em: . Citado na página 37, 174 MALDE, A. K. et al. An Automated Force Field Topology Builder (ATB) and Repository: Version 1.0. Journal of Chemical Theory and Computation, American Chemical Society, v. 7, n. 12, p. 4026–4037, dec 2011. ISSN 1549-9618. Disponível em: . Citado 2 vezes nas páginas 37 e 156, 175 STROET, M. et al. Automated Topology Builder Version 3.0: Prediction of Solvation Free Enthalpies in Water and Hexane. Journal of Chemical Theory and Computation, American Chemical Society, v. 14, n. 11, p. 5834–5845, nov 2018. ISSN 1549-9618. Disponível em: . Citado 2 vezes nas páginas 37 e 157, 176 BLEIZIFFER, P.; SCHALLER, K.; RINIKER, S. Machine Learning of Partial Charges Derived from High-Quality Quantum-Mechanical Calculations. Journal of Chemical Information and Modeling, American Chemical Society, v. 58, n. 3, p. 579–590, mar 2018. ISSN 1549-9596. Disponível em: . Citado na página 37, 177 HEHRE, W.; POPLE, J. Atomic electron populations for some simple molecules. Chemical Physics Letters, North-Holland, v. 2, n. 6, p. 379–380, oct 1968. ISSN Referências 188 0009-2614. Disponível em: . Citado na página 38, 178 HOFFMANN, R.; IMAMURA, A. Quantum mechanical approach to the conformational analysis of macromolecules in ground and excited states. Biopolymers, John Wiley & Sons, Ltd, v. 7, n. 2, p. 207–213, feb 1969. ISSN 0006-3525. Disponível em: . Citado na página 38, 179 MOMANY, F. A. et al. Energy parameters in polypeptides. III. Semiempirical molecular orbital calculations for hydrogen-bonded model peptides. The Journal of Physical Chemistry, American Chemical Society, v. 74, n. 12, p. 2424–2438, jun 1970. ISSN 0022-3654. Disponível em: . Citado na página 38, 180 COX, S. R.; WILLIAMS, D. E. Representation of the molecular electrostatic potential by a net atomic charge model. Journal of Computational Chemistry, John Wiley & Sons, Ltd, v. 2, n. 3, p. 304–323, 1981. ISSN 0192-8651. Disponível em: . Citado na página 38, 181 MOMANY, F. A. et al. Energy parameters in polypeptides. IV. Semiempirical molecular orbital calculations of conformational dependence of energy and partial charge in di- and tripeptides. The Journal of physical chemistry, v. 75, n. 15, p. 2286–97, jul 1971. ISSN 0022-3654. Disponível em: . Citado na página 38, 182 BECKE, A. D. Perspective: Fifty years of density-functional theory in chemical physics. The Journal of Chemical Physics, American Institute of Physics, v. 140, n. 18, p. 18A301, may 2014. ISSN 0021-9606. Disponível em: . Citado na página 38, 183 ST-AMANT, A.; SALAHUB, D. R. New algorithm for the optimization of geometries in local density functional theory. Chemical Physics Letters, North-Holland, v. 169, n. 5, p. 387–392, jun 1990. ISSN 0009-2614. Disponível em: . Citado na página 38, 184 HORTA, B. A. C. et al. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. Journal of Chemical Theory and Computation, American Chemical Society, v. 12, n. 8, p. 3825–3850, aug 2016. ISSN 1549-9618. Disponível em: . Citado 4 vezes nas páginas 38, 45, 56 e 158, 185 POLêTO, M. D. et al. Aromatic rings commonly used in medicinal chemistry: Force fields comparison and interactions with water toward the design of new chemical entities. Frontiers in Pharmacology, v. 9, p. 395, 2018. ISSN 1663-9812. Disponível em: . Citado 6 vezes nas páginas 38, 46, 51, 157, 158 e 159, Referências 189 186 HANSEN, H. S.; HÜNENBERGER, P. H. A reoptimized GROMOS force field for hexopyranose-based carbohydrates accounting for the relative free energies of ring conformers, anomers, epimers, hydroxymethyl rotamers, and glycosidic linkage conformers. Journal of Computational Chemistry, Wiley Subscription Services, Inc., A Wiley Company, v. 32, n. 6, p. 998–1032, apr 2011. ISSN 01928651. Disponível em: . Citado na página 38, 187 HARTREE, D. R.; HARTREE, W. Self-Consistent Field, with Exchange, for Beryllium. P. Roy. Soc. A - Math. Phy., The Royal Society, v. 150, n. 869, p. 9–33, may 1935. ISSN 1364-5021. Disponível em: . Citado 2 vezes nas páginas 38 e 47, 188 PETERSSON, G. A. et al. A complete basis set model chemistry. I. The total energies of closed-shell atoms and hydrides of the first-row elements. J Chem Phys, American Institute of Physics, v. 89, n. 4, p. 2193–2218, aug 1988. ISSN 0021-9606. Disponível em: . Citado 2 vezes nas páginas 38 e 45, 189 MØLLER, C.; PLESSET, M. S. Note on an Approximation Treatment for ManyElectron Systems. Phys. Rev., American Physical Society, v. 46, n. 7, p. 618–622, oct 1934. ISSN 0031-899X. Disponível em: . Citado 2 vezes nas páginas 38 e 45, 190 MAIER, J. A. et al. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. Journal of Chemical Theory and Computation, v. 11, n. 8, p. 3696–3713, aug 2015. ISSN 1549-9618. Disponível em: . Citado na página 38, 191 HUANG, J.; MACKERELL, A. D. CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data. Journal of Computational Chemistry, v. 34, n. 25, p. 2135–2145, sep 2013. ISSN 01928651. Disponível em: . Citado na página 38, 192 W. F. van Gunsteren; BERENDSEN, H. J. C. Groningen Molecular Simulation (GROMOS) Library Manual. Biomos, Groningen, The Netherlands, p. 1–221, 1987. Citado na página 38, 193 HORTA, B. A. C. et al. New Interaction Parameters for Oxygen Compounds in the GROMOS Force Field: Improved Pure-Liquid and Solvation Properties for Alcohols, Ethers, Aldehydes, Ketones, Carboxylic Acids, and Esters. Journal of Chemical Theory and Computation, American Chemical Society, v. 7, n. 4, p. 1016–1031, apr 2011. ISSN 1549-9618. Disponível em: . Citado 2 vezes nas páginas 38 e 45, Referências 190 194 POL-FACHIN, L. et al. GROMOS 53A6 GLYC , an Improved GROMOS Force Field for Hexopyranose-Based Carbohydrates. Journal of Chemical Theory and Computation, American Chemical Society, v. 8, n. 11, p. 4681–4690, nov 2012. ISSN 1549-9618. Disponível em: . Citado na página 38, 195 POL-FACHIN, L.; VERLI, H.; LINS, R. D. Extension and validation of the GROMOS 53A6(GLYC) parameter set for glycoproteins. Journal of computational chemistry, v. 35, n. 29, p. 2087–95, nov 2014. ISSN 1096-987X. Disponível em: . Citado na página 38, 196 OOSTENBRINK, C. et al. Validation of the 53A6 GROMOS force field. European Biophysics Journal, Springer-Verlag, v. 34, n. 4, p. 273–284, jun 2005. ISSN 0175-7571. Disponível em: . Citado na página 39, 197 GROOM, C. R. et al. The Cambridge Structural Database. Acta Crystallographica Section B Structural Science, Crystal Engineering and Materials, International Union of Crystallography, v. 72, n. 2, p. 171–179, apr 2016. ISSN 2052-5206. Disponível em: . Citado na página 39, 198 HAWKINS, P. C. D. Conformation Generation: The State of the Art. Journal of Chemical Information and Modeling, American Chemical Society, v. 57, n. 8, p. 1747–1756, aug 2017. ISSN 1549-9596. Disponível em: . Citado 2 vezes nas páginas 39 e 40, 199 GILL, S. C. et al. Binding Modes of Ligands Using Enhanced Sampling (BLUES): Rapid Decorrelation of Ligand Binding Modes via Nonequilibrium Candidate Monte Carlo. The journal of physical chemistry. B, NIH Public Access, v. 122, n. 21, p. 5579–5598, may 2018. ISSN 1520-5207. Disponível em: . Citado na página 39, 200 COLE, D. J.; TIRADO-RIVES, J.; JORGENSEN, W. L. Enhanced Monte Carlo Sampling through Replica Exchange with Solute Tempering. Journal of Chemical Theory and Computation, American Chemical Society, v. 10, n. 2, p. 565–571, feb 2014. ISSN 1549-9618. Disponível em: . Citado na página 39, 201 CLARK, M. et al. Grand Canonical Monte Carlo Simulation of LigandProtein Binding. Journal of Chemical Information and Modeling, American Chemical Society, v. 46, n. 1, p. 231–242, 2005. Disponível em: . Citado na página 39, 202 FRIEDRICH, N.-O. et al. Benchmarking Commercial Conformer Ensemble Generators. Journal of Chemical Information and Modeling, American Chemical Society, v. 57, n. 11, p. 2719–2728, nov 2017. ISSN 1549-9596. Disponível em: . Citado na página 40, Referências 191 203 VAINIO, M. J.; JOHNSON, M. S. Generating Conformer Ensembles Using a Multiobjective Genetic Algorithm. American Chemical Society, 2007. Disponível em: . Citado na página 40, 204 STRIZHEV, A. et al. The Effects of Biasing Torsional Mutations in a Conformational GA. American Chemical Society, 2006. Disponível em: . Citado na página 40, 205 BAI, F. et al. Bioactive conformational generation of small molecules: A comparative analysis between force-field and multiple empirical criteria based methods. BMC Bioinformatics, v. 11, n. 1, p. 545, nov 2010. ISSN 1471-2105. Disponível em: . Citado na página 40, 206 SUPADY, A.; BLUM, V.; BALDAUF, C. First-Principles Molecular Structure Search with a Genetic Algorithm. Journal of Chemical Information and Modeling, American Chemical Society, v. 55, n. 11, p. 2338–2348, nov 2015. ISSN 1549-9596. Disponível em: . Citado na página 40, 207 RINIKER, S.; LANDRUM, G. A. Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation. Journal of Chemical Information and Modeling, American Chemical Society, v. 55, n. 12, p. 2562–2574, dec 2015. ISSN 1549-9596. Disponível em: . Citado na página 40, 208 MITEVA, M. A.; GUYON, F.; TUFFÉRY, P. Frog2: Efficient 3D conformation ensemble generator for small compounds. Nucleic acids research, Oxford University Press, v. 38, n. Web Server issue, p. W622–7, jul 2010. ISSN 1362-4962. Disponível em: . Citado na página 40, 209 SAUTON, N. et al. MS-DOCK: Accurate multiple conformation generator and rigid docking protocol for multi-step virtual ligand screening. BMC Bioinformatics, BioMed Central, v. 9, n. 1, p. 184, apr 2008. ISSN 1471-2105. Disponível em: . Citado na página 40, 210 HAWKINS, P. C. D. et al. Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database. Journal of Chemical Information and Modeling, American Chemical Society, v. 50, n. 4, p. 572–584, apr 2010. ISSN 1549-9596. Disponível em: . Citado na página 40, 211 BLUNDELL, C. D.; PACKER, M. J.; ALMOND, A. Quantification of free ligand conformational preferences by NMR and their relationship to the bioactive Referências 192 conformation. Bioorg. Med. Chem., Pergamon, v. 21, n. 17, p. 4976–4987, 2013. ISSN 09680896. Citado na página 41, 212 Andrea Cavalli et al. A Computational Study of the Binding of Propidium to the Peripheral Anionic Site of Human Acetylcholinesterase. American Chemical Society, 2004. Disponível em: . Citado na página 41, 213 KACKER, P. et al. Combining Dyad Protonation and Active Site Plasticity in BACE-1 Structure-Based Drug Design. Journal of Chemical Information and Modeling, v. 52, n. 5, p. 1079–1085, may 2012. ISSN 15499596. Disponível em: . Citado na página 41, 214 FIGUEIRA, F. et al. [28]Hexaphyrin derivatives for anion recognition in organic and aqueous media. Chem. Commun., The Royal Society of Chemistry, v. 52, n. 10, p. 2181–2184, 2016. ISSN 1359-7345. Disponível em: . Citado 2 vezes nas páginas 41 e 51, 215 ARANTES, P. et al. Conformational Characterization of Ipomotaosides and Their Recognition by COX-1 and 2. Molecules, Multidisciplinary Digital Publishing Institute, v. 19, n. 4, p. 5421–5433, apr 2014. ISSN 1420-3049. Disponível em: . Citado na página 41, 216 PEDEBOS, C.; POL-FACHIN, L.; VERLI, H. Unrestrained Conformational Characterization of Stenocereus eruca Saponins in Aqueous and Nonaqueous Solvents. Journal of Natural Products, American Chemical Society and American Society of Pharmacognosy, v. 75, n. 6, p. 1196–1200, jun 2012. ISSN 0163-3864. Disponível em: . Citado 2 vezes nas páginas 41 e 54, 217 FERENCZY, G. G.; KESERű, G. M. The impact of binding thermodynamics on medicinal chemistry optimizations. Future Medicinal Chemistry, v. 7, n. 10, p. 1285–1303, jul 2015. ISSN 1756-8919. Disponível em: . Citado na página 41, 218 De Vivo, M. et al. Role of Molecular Dynamics and Related Methods in Drug Discovery. Journal of Medicinal Chemistry, American Chemical Society, v. 59, n. 9, p. 4035–4061, may 2016. ISSN 0022-2623. Disponível em: . Citado 3 vezes nas páginas 41, 42 e 164, 219 FERGUSON, D. M.; RADMER, R. J.; KOLLMAN, P. A. Determination of the relative binding free energies of peptide inhibitors to the HIV-1 protease. Journal of Medicinal Chemistry, American Chemical Society, v. 34, n. 8, p. 2654–2659, aug 1991. ISSN 0022-2623. Disponível em: . Citado na página 42, 220 JORGENSEN, W. L. Efficient Drug Lead Discovery and Optimization. Accounts of Chemical Research, American Chemical Society, v. 42, n. 6, p. 724–733, jun 2009. Referências 193 ISSN 0001-4842. Disponível em: . Citado na página 42, 221 WANG, L. et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. Journal of the American Chemical Society, American Chemical Society, v. 137, n. 7, p. 2695–2703, feb 2015. ISSN 0002-7863. Disponível em: . Citado na página 42, 222 LIMONGELLI, V. et al. Molecular basis of cyclooxygenase enzymes (COXs) selective inhibition. Proceedings of the National Academy of Sciences, v. 107, n. 12, p. 5411–5416, mar 2010. ISSN 0027-8424. Disponível em: . Citado 2 vezes nas páginas 42 e 164, 223 LIMONGELLI, V.; BONOMI, M.; PARRINELLO, M. Funnel metadynamics as accurate binding free-energy method. Proceedings of the National Academy of Sciences, v. 110, n. 16, p. 6358–6363, apr 2013. ISSN 0027-8424. Disponível em: . Citado 2 vezes nas páginas 42 e 164, 224 FAVIA, A. D. et al. Substrate Binding Process and Mechanistic Functioning of Type 1 11β-Hydroxysteroid Dehydrogenase from Enhanced Sampling Methods. PLoS ONE, Public Library of Science, v. 6, n. 9, p. e25375, sep 2011. ISSN 1932-6203. Disponível em: . Citado 2 vezes nas páginas 42 e 164, 225 PATEL, J. S. et al. Insights into Ligand–Protein Binding from Local Mechanical Response. Journal of Chemical Theory and Computation, American Chemical Society, v. 7, n. 10, p. 3368–3378, oct 2011. ISSN 1549-9618. Disponível em: . Citado 2 vezes nas páginas 42 e 164, 226 DOUDOU, S.; BURTON, N. A.; HENCHMAN, R. H. Standard Free Energy of Binding from a One-Dimensional Potential of Mean Force. Journal of Chemical Theory and Computation, American Chemical Society, v. 5, n. 4, p. 909–918, apr 2009. ISSN 1549-9618. Disponível em: . Citado 2 vezes nas páginas 42 e 164, 227 GE, X.; ROUX, B. Absolute Binding Free Energy Calculations of Sparsomycin Analogs to the Bacterial Ribosome. The Journal of Physical Chemistry B, American Chemical Society, v. 114, n. 29, p. 9525–9539, jul 2010. ISSN 1520-6106. Disponível em: . Citado 2 vezes nas páginas 42 e 164, 228 WOO, H.-J.; ROUX, B. Calculation of absolute protein-ligand binding free energy from computer simulations. Proceedings of the National Academy of Sciences, v. 102, n. 19, p. 6825–6830, may 2005. ISSN 0027-8424. Disponível em: . Citado 2 vezes nas páginas 42 e 164, 229 MENNUCCI, B.; TOMASI, J. Continuum solvation models: A new approach to the problem of solute’s charge distribution and cavity boundaries. The Journal of Chemical Physics, AIP Publishing, v. 106, n. 12, p. 5151, 1997. ISSN 00219606. Disponível em: . Citado na página 45, 230 BAYLY, C. I. et al. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. The Journal of Physical Chemistry, American Chemical Society, v. 97, n. 40, p. 10269–10280, oct 1993. ISSN 0022-3654. Disponível em: . Citado na página 45, 231 FOCK, V. Näherungsmethode zur Lösung des quantenmechanischen Mehrkörperproblems. Z. Phys., Springer-Verlag, v. 61, n. 1-2, p. 126–148, jan 1930. ISSN 1434-6001. Disponível em: . Citado na página 47, 232 RUSU, V. H.; BARON, R.; LINS, R. D. PITOMBA: Parameter Interface for Oligosaccharide Molecules Based on Atoms. J. Chem. Theory. Comput., American Chemical Society, v. 10, n. 11, p. 5068–5080, nov 2014. ISSN 1549-9618. Disponível em: . Citado na página 47, 233 BERENDSEN, H. J. C. et al. Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics, AIP Publishing, v. 81, n. 8, p. 3684, 1984. ISSN 00219606. Disponível em: . Citado na página 48, 234 BARKER, J.; WATTS, R. Monte Carlo studies of the dielectric properties of water-like models. Molecular Physics, Taylor & Francis Group, v. 26, n. 3, p. 789–792, sep 1973. ISSN 0026-8976. Disponível em: . Citado 2 vezes nas páginas 48 e 52, 235 WATTS, R. Monte Carlo studies of liquid water. Molecular Physics, Taylor & Francis Group, v. 28, n. 4, p. 1069–1083, oct 1974. ISSN 0026-8976. Disponível em: . Citado 2 vezes nas páginas 48 e 52, 236 FRENKEL, M.; MARSH, K. N. (Ed.). Densities of Halohydrocarbons. Berlin/Heidelberg: Springer-Verlag, 2003. v. 8J. (Landolt-Bornstein - Group IV Physical Chemistry, v. 8J). ISBN 3-540-00083-6. Disponível em: . Citado na página 48, 237 LIDE, D. R. CRC Handbook of Chemistry and Physics - Reference Book of Chemical and Physical Data. CRC Press, 1999. section 8, 22–29 p. ISBN 9781439820773 (hbk.) 1439820775 (hbk.). Disponível em: . Citado na página 48, 238 FINGER, G. C.; REED, F. H.; OESTERLING, R. E. Aromatic Fluorine Compounds. IV. 1,2,3,5-Tetrafluorobenzene. J. Am. Chem. Soc., American Chemical Society, v. 73, n. 1, p. 152–153, jan 1951. ISSN 15205126. Disponível em: . Citado na página 48, 239 FINDLAY, T. J. V. Vapor pressures of fluorobenzenes from 5 to 50C. J. Chem. Eng. Data., American Chemical Society, v. 14, n. 2, p. 229–231, apr 1969. ISSN 0021-9568. Disponível em: . Citado na página 48, 240 YAWS, C. Yaws’ Handbook of Thermodynamic and Physical Properties of Chemical Compounds. Norwich, N.Y. :: Knovel, 2003. ISBN 978-1-59124-444-8. Citado na página 48, 241 YAWS, C. L. Thermophysical Properties of Chemicals and Hydrocarbons. Elsevier, 2009. vii p. ISBN 978-0-8155-1596-8. Disponível em: . Citado na página 48, 242 HALES, J. L.; TOWNSEND, R. Liquid densities from 293 to 490 K of eight fluorinated aromatic compounds. The Journal of Chemical Thermodynamics, Academic Press, v. 6, n. 2, p. 111–116, 1974. ISSN 10963626. Disponível em: . Citado na página 48, 243 ABRAHAM, M. H. et al. Thermodynamics of solute transfer from water to hexadecane. J. Chem. Soc. Perk T 2, The Royal Society of Chemistry, v. 77, n. 2, p. 291, 1990. ISSN 0300-9580. Disponível em: . Citado na página 48, 244 Van Gunsteren, W. F.; BERENDSEN, H. J. C. A Leap-frog Algorithm for Stochastic Dynamics. Mol. Simulat., Taylor & Francis Group, v. 1, n. 3, p. 173–185, mar 1988. ISSN 0892-7022. Disponível em: . Citado 2 vezes nas páginas 48 e 51, 245 KUNZ, A. P. E.; Van Gunsteren, W. F. Development of a nonlinear classical polarization model for liquid water and aqueous solutions: COS/D. J. Phys. Chem. A, American Chemical Society, v. 113, n. 43, p. 11570–11579, oct 2009. ISSN 10895639. Disponível em: . Citado na página 49, 246 BEUTLER, T. C. et al. Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chem. Phys. Lett., v. 222, n. 6, p. 529–539, jun 1994. ISSN 00092614. Disponível em: . Citado na página 50, Referências 196 247 SHIRTS, M. R.; PANDE, V. S. Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett acceptance ratio, and thermodynamic integration. J. Chem. Phys., American Institute of Physics, v. 122, n. 14, p. 144107, apr 2005. ISSN 00219606. Disponível em: . Citado na página 50, 248 PARRINELLO, M.; RAHMAN, A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys., American Institute of Physics, v. 52, n. 12, p. 7182–7190, dec 1981. ISSN 00218979. Disponível em: . Citado 2 vezes nas páginas 51 e 52, 249 TESCH, R. et al. An unusual intramolecular halogen bond guides conformational selection. Angewandte Chemie International Edition, v. 0, n. ja. Disponível em: . Citado 2 vezes nas páginas 51 e 163, 250 NOSÉ, S. A molecular dynamics method for simulations in the canonical ensemble. Molecular Physics, Taylor & Francis Group, v. 52, n. 2, p. 255–268, jun 1984. ISSN 0026-8976. Disponível em: . Citado na página 52, 251 BUSSI, G.; DONADIO, D.; PARRINELLO, M. Canonical sampling through velocity rescaling. J. Chem. Phys., v. 126, n. 1, mar 2007. ISSN 00219606. Disponível em: . Citado na página 52, 252 HEINZ, T. N.; GUNSTEREN, W. F. van; HÜNENBERGER, P. H. Comparison of four methods to compute the dielectric permittivity of liquids from molecular dynamics simulations. Journal of Chemical Physics, American Institute of Physics, v. 115, n. 3, p. 1125–1136, jul 2001. ISSN 0021-9606. Disponível em: . Citado na página 52, 253 TRIBELLO, G. A. et al. PLUMED 2: New feathers for an old bird. Computer Physics Communications, v. 185, n. 2, p. 604–613, 2014. ISSN 0010-4655. Disponível em: . Citado na página 52, 254 HESS, B. et al. LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry, John Wiley & Sons, Inc., v. 18, n. 12, p. 1463–1472, sep 1997. ISSN 0192-8651. Citado na página 52, 255 HESS, B. P-LINCS: A Parallel Linear Constraint Solver for Molecular Simulation. Journal of Chemical Theory and Computation, American Chemical Society, v. 4, n. 1, p. 116–122, jan 2008. ISSN 1549-9618. Disponível em: . Citado na página 52, 256 BRANDUARDI, D.; BUSSI, G.; PARRINELLO, M. Metadynamics with adaptive gaussians. Journal of Chemical Theory and Computation, v. 8, n. 7, p. 2247–2254, 2012. PMID: 26588957. Disponível em: . Citado na página 52, Referências 197 257 TAYLOR, R. D.; MACCOSS, M.; LAWSON, A. D. G. Rings in drugs. Journal of medicinal chemistry, v. 57, n. 14, p. 5845–59, jul 2014. ISSN 1520-4804. Disponível em: . Citado 2 vezes nas páginas 56 e 157, 258 BROUGHTON, H. B.; WATSON, I. A. Selection of heterocycles for drug design. J. Mol. Graph. Model., v. 23, n. 1, p. 51–58, sep 2004. ISSN 10933263. Disponível em: . Citado na página 56, 259 WELSCH, M. E.; SNYDER, S. A.; STOCKWELL, B. R. Privileged scaffolds for library design and drug discovery. Current opinion in chemical biology, NIH Public Access, v. 14, n. 3, p. 347–61, jun 2010. ISSN 18790402. Disponível em: . Citado na página 56, 260 TAYLOR, R. D.; MACCOSS, M.; LAWSON, A. D. Combining Molecular Scaffolds from FDA Approved Drugs: Application to Drug Discovery. J. Med. Chem., American Chemical Society, v. 60, n. 5, p. 1638–1647, mar 2017. ISSN 15204804. Disponível em: . Citado na página 56, 261 JORDAN, A. M.; ROUGHLEY, S. D. Drug discovery chemistry: a primer for the non-specialist. Drug Discovery Today, v. 14, n. 15, p. 731–744, 2009. ISSN 13596446. Citado na página 56, 262 MIERTUŠ, S.; SCROCCO, E.; TOMASI, J. Electrostatic interaction of a solute with a continuum. A direct utilizaion of AB initio molecular potentials for the prevision of solvent effects. Chemical Physics, North-Holland, v. 55, n. 1, p. 117–129, feb 1981. ISSN 0301-0104. Citado na página 156, 263 CAMMI, R.; TOMASI, J. Remarks on the use of the apparent surface charges (ASC) methods in solvation problems: Iterative versus matrix-inversion procedures and the renormalization of the apparent charges. Journal of Computational Chemistry, John Wiley & Sons, Ltd, v. 16, n. 12, p. 1449–1458, dec 1995. ISSN 0192-8651. Disponível em: . Citado na página 156, 264 ALDEGHI, M. et al. Two- and three-dimensional rings in drugs. Chem. Biol. Drug Des., Wiley-Blackwell, v. 83, n. 4, p. 450–461, apr 2014. ISSN 17470285. Disponível em: . Citado na página 157, 265 ROUGHLEY, S. D.; JORDAN, A. M. The medicinal chemist’s toolbox: An analysis of reactions used in the pursuit of drug candidates. J. Med. Chem., American Chemical Society, v. 54, n. 10, p. 3451–3479, may 2011. ISSN 00222623. Disponível em: . Citado na página 157, 266 LEMKUL, J. A.; ALLEN, W. J.; BEVAN, D. R. Practical considerations for building GROMOS-compatible small-molecule topologies. Journal of chemical Referências 198 information and modeling, v. 50, n. 12, p. 2221–35, dec 2010. ISSN 1549-960X. Disponível em: . Citado 3 vezes nas páginas 157, 158 e 159, 267 ROMO, T. D.; GROSSFIELD, A. Unknown unknowns: the challenge of systematic and statistical error in molecular dynamics simulations. Biophysical journal, Elsevier, v. 106, n. 8, p. 1553–4, apr 2014. ISSN 1542-0086. Disponível em: . Citado na página 160, 268 GROSSFIELD, A.; FELLER, S. E.; PITMAN, M. C. Convergence of molecular dynamics simulations of membrane proteins. Proteins: Structure, Function, and Bioinformatics, v. 67, n. 1, p. 31–40, jan 2007. ISSN 08873585. Disponível em: . Citado na página 160, 269 GROSSFIELD, A.; ZUCKERMAN, D. M. Chapter 2 Quantifying Uncertainty and Sampling Quality in Biomolecular Simulations. In: Annual reports in computational chemistry. [s.n.], 2009. v. 5, p. 23–48. Disponível em: . Citado na página 160, 270 GROSSFIELD, A. et al. Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations [Article v1.0]. Living Journal of Computational Molecular Science, v. 1, n. 1, p. 5067, oct 2019. ISSN 25756524. Citado na página 160, 271 MORRIS, G. M. et al. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal of computational chemistry, NIH Public Access, v. 30, n. 16, p. 2785–91, dec 2009. ISSN 1096-987X. Disponível em: . Citado na página 162, 272 CHANGEUX, J.-P.; EDELSTEIN, S. Conformational selection or induced fit? 50 years of debate resolved. F1000 biology reports, Faculty of 1000 Ltd, v. 3, p. 19, 2011. ISSN 1757-594X. Disponível em: . Citado na página 164, 273 CSERMELY, P.; PALOTAI, R.; NUSSINOV, R. Induced fit, conformational selection and independent dynamic segments: an extended view of binding events. Trends in biochemical sciences, NIH Public Access, v. 35, n. 10, p. 539–46, oct 2010. ISSN 0968-0004. Disponível em: . Citado na página 164, 274 HERSCHLAG, D. The role of induced fit and conformational changes of enzymes in specificity and catalysis. Bioorganic Chemistry, Academic Referências 199 Press, v. 16, n. 1, p. 62–96, mar 1988. ISSN 0045-2068. Disponível em: . Citado na página 164, 275 SILVA, D.-A. et al. A Role for Both Conformational Selection and Induced Fit in Ligand Binding by the LAO Protein. PLoS Computational Biology, Public Library of Science, v. 7, n. 5, p. e1002054, may 2011. ISSN 1553-7358. Disponível em: . Citado na página 164, 276 FLECK, M.; POLYANSKY, A. A.; ZAGROVIC, B. A self-consistent framework connecting experimental proxies of protein dynamics with configurational entropy. Journal of Chemical Theory and Computation, American Chemical Society, p. acs.jctc.8b00100, may 2018. ISSN 1549-9618. Disponível em: . Citado na página 164, 277 MEIROVITCH, H.; CHELUVARAJA, S.; WHITE, R. P. Methods for calculating the entropy and free energy and their application to problems involving protein flexibility and ligand binding. Current protein & peptide science, NIH Public Access, v. 10, n. 3, p. 229–43, jun 2009. ISSN 13892037. Disponível em: . Citado na página 164, 278 KILLIAN, B. J.; Yundenfreund Kravitz, J.; GILSON, M. K. Extraction of configurational entropy from molecular simulations via an expansion approximation. The Journal of chemical physics, NIH Public Access, v. 127, n. 2, p. 024107, jul 2007. ISSN 0021-9606. Disponível em: . Citado na página 164, 279 LEVY, R. M. et al. Evaluation of the configurational entropy for proteins: application to molecular dynamics simulations of an α-helix. Macromolecules, American Chemical Society, v. 17, n. 7, p. 1370–1374, jul 1984. ISSN 0024-9297. Disponível em: . Citado na página 164, 280 CARLSSON, J.; ÅQVIST, J. Absolute and Relative Entropies from Computer Simulation with Applications to Ligand Binding. Journal of Physical Chemistry B, American Chemical Society, v. 109, n. 13, p. 6448–6456, 2005. Disponível em: . Citado na página 164, 281 IRUDAYAM, S. J.; HENCHMAN, R. H. Entropic Cost of ProteinLigand Binding and Its Dependence on the Entropy in Solution. The Journal of Physical Chemistry B, v. 113, n. 17, p. 5871–5884, apr 2009. ISSN 15206106. Disponível em: . Citado na página 164, 282 SHIM, J.; MACKERELL, A. D. Computational ligand-based rational design: Role of conformational sampling and force fields in model development. MedChemComm, v. 2, n. 5, p. 356–370, may 2011. ISSN 2040-2503. Disponível em: Referências 200 . Citado na página 164, 283 KLEPEIS, J. L. et al. Long-timescale molecular dynamics simulations of protein structure and function. Current Opinion in Structural Biology, v. 19, n. 2, p. 120–127, apr 2009. ISSN 0959440X. Disponível em: . Citado na página 164, 284 DROR, R. O. et al. Exploring atomic resolution physiology on a femtosecond to millisecond timescale using molecular dynamics simulations. The Journal of General Physiology, Rockefeller University Press, v. 135, n. 6, p. 555–562, jun 2010. ISSN 0022-1295. Disponível em: . Citado na página 164, 285 MOORE, G. E. Cramming more components onto integrated circuits. Electronics, v. 38, n. 8, p. 114–117, 1965. Citado na página 165, Anexos 202 ANEXO A – Trabalhos desenvolvidos ao longo do doutoramento A.1 Homology modeling and molecular dynamics provide structural insights into tospovirus nucleoprotein Tospovirus são um gênero da família Bunyaviridae que infecta plantas e causa severas perdas em diversas plantações na América do Sul. Como todos de seu gênero, os tospovírus possuem genoma de ssRNA trisegmentado que codifica a RNA polimerase viral (RdRp), duas glicoproteínas (Gn/Gc), uma proteína de movimento (NSm), uma proteína da supressão de silenciamento de RNA (NSs) e uma nucleoproteína (N). Apesar dos múltiplos estudos nessa família, pouco ainda se sabe sobre os aspecto molecular da ligação entre a proteína N e o RNA genômico, principalmente sobre o aspecto estrutural. Assim, a estrutura da nucleoproteína N foi modelada com base no molde homólogo do vírus LACV (La Crosse virus-Orthobunyavirus), resultando na estrutura tridimensional monomérica composta por regiões N-terminal e C-terminal flexíveis e um núcleo globular contendo um domínio carregado positivamente no qual se liga a molécula de RNA. Esse modelo estrutural permitiu a identificação de um modelo de interação RNA-nucleoproteína e também de multimerização. As simulações de dinâmica molecular revelaram o papel crucial da porção N-terminal na estabilização da molécula de RNA ligada à nucleoproteína. Em última instância, o modelo proposto neste trabalho é corroborado por todos os estudos de mutação realizados com a nucleoproteína N até o momento Os dados coletados contribuirão para um melhor desenho metodológico de estudos funcionais ou de mutação sítio-dirigida na nucleoproteína N. Ainda, o modelo proposto oferece subsídios para identificação de aminoácidos essenciais para a formação do complexo RNA-nucleoproteína e, consequentemente, no desenvolvimento de estratégias para controle de tospovírus. The Author(s) BMC Bioinformatics 2016, 17(Suppl 18):489 DOI 10.1186/s12859-016-1339-4 RESEARCH Open Access Homology modeling and molecular dynamics provide structural insights into tospovirus nucleoprotein Rayane Nunes Lima1†, Muhammad Faheem2,3†, João Alexandre Ribeiro Gonçalves Barbosa2,3, Marcelo Depólo Polêto4, Hugo Verli4, Fernando Lucas Melo1 and Renato Oliveira Resende1* From 11th International Conference of the AB3C + Brazilian Symposium of Bioinformatics São Paulo, Brazil. 3-6 November 2015 Abstract Background: Tospovirus is a plant-infecting genus within the family Bunyaviridae, which also includes four animalinfecting genera: Hantavirus, Nairovirus, Phlebovirus and Orthobunyavirus. Compared to these members, the structures of Tospovirus proteins still are poorly understood. Despite multiple studies have attempted to identify candidate N protein regions involved in RNA binding and protein multimerization for tospovirus using yeast two-hybrid systems (Y2HS) and site-directed mutagenesis, the tospovirus ribonucleocapsids (RNPs) remains largely uncharacterized at the molecular level and the lack of structural information prevents detailed insight into these interactions. Results: Here we used the nucleoprotein structure of LACV (La Crosse virus-Orthobunyavirus) and molecular dynamics simulations to access the structure and dynamics of the nucleoprotein from tospovirus GRSV (Groundnut ringspot virus). The resulting model is a monomer composed by a flexible N-terminal and C-terminal arms and a globular domain with a positively charged groove in which RNA is deeply encompassed. This model allowed identifying the candidate amino acids residues involved in RNA interaction and N-N multimerization. Moreover, most residues predicted to be involved in these interactions are highly conserved among tospoviruses. Conclusions: Crucially, the interaction model proposed here for GRSV N is further corroborated by the all available mutational studies on TSWV (Tomato spotted wilt virus) N, so far. Our data will help designing further and more accurate mutational and functional studies of tospovirus N proteins. In addition, the proposed model may shed light on the mechanisms of RNP shaping and could allow the identification of essential amino acid residues as potential targets for tospovirus control strategies. Keywords: Homology modeling, Molecular dynamics, Tospovirus, Nucleoprotein Background Tospovirus is a thrips-borne plant-infecting genus within the family Bunyaviridae, which also includes four animalinfecting genera: Hanta/Nairo/Phlebo- and Orthobunyavirus [1]. GRSV (Groundnut ringspot virus) is an emerging tospovirus, that has caused severe diseases in distinct vegetable crops in South America and is phylogenetically * Correspondence: rresende@unb.br †Equal contributors 1Laboratório de Virologia Vegetal, Departamento de Biologia Celular, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil Full list of author information is available at the end of the article close to the tospovirus type-species TSWV (Tomato spotted wilt virus) [2]. Like all tospoviruses, GRSV contain a trisegmented negative single-stranded RNA (ssRNA) genome that encodes the viral RNA-dependent RNA polymerase (RdRp), two glycoproteins (Gn/Gc), the movement protein (NSm), the RNA silencing suppressor protein (NSs) and the nucleoprotein (N) [3]. N is a multifunctional protein involved in RNA protection, particle assembly, intracellular movement and might play a role in transcription/replication regulation [4–14]. Multiple copies of the N protein form oligomers that interact with the © The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. The Author(s) BMC Bioinformatics 2016, 17(Suppl 18):489 Page 12 of 86 viral RNAs to build ribonucleoprotein complexes (RNPs) that are proposed to be transported via plasmodesmata and are functional templates for RNA replication and transcription [6, 15, 16]. Multiple studies have attempted to identify candidate N protein regions involved in RNA binding and protein multimerization for TSWV using yeast two-hybrid systems (Y2HS) and site-directed mutagenesis [4, 6, 17, 18], but the tospovirus RNPs remains largely uncharacterized at the molecular level and the lack of structural information prevents detailed insight into these interactions. The lack of a reverse genetics system, which is available for other bunyaviruses, has hampered tospovirus research. The N protein crystal structures of related RNA virus families (Arena/Orthomyxo/Bunyaviridae) have been elucidated [8, 19–26] and despite different size and distinct N-folding structures, there are common features and architectural principles by which these proteins form N-N multimers and N-RNA complexes [27]. Therefore, these available structures were used to predict a three-dimensional model for GRSV N (the most important and prevalent tospovirus in Brazil) using homology modeling. Results and discussion Three-dimensional model of GRSV N and oligomerization The GRSV N and LACV N have similar protein fold with the predicted GRSV N monomer forming thirteen helical segments and two small beta-sheets (Figs. 1, 2a and e-f). The protein has a globular core domain (26–223 aa) containing a deep positively charged groove with the two chain terminals forming an N-terminus arm (1–25 aa) and a C-terminus arm (224–258 aa) (Fig. 2a-b and Fig. 3). The N- and C-arms extend outwards from the globular core domain and interacts with the globular core domain of neighboring monomers to mediate the multimerization, supporting the “head-to-tail” model proposed by [18]. Amino acids S2-V12 of the N-arm interact with the Q61N82 of the core domain of one neighboring monomer (Fig. 2c and e) while K227-K249 of the C-arm interact with the K173–K198 of the core domain the other neighboring monomer (Fig. 2d and f). Specific residue-residue Fig. 1 Groundnut ringspot virus (GRSV) and La crosse virus (LACV) Nucleoproteins sequence alignment. Key residues for GRSV N and LACV N oligomerization and for ssRNA binding are colored as indicated by the colored bars. The secondary structure of LACV N is shown above, and every 10 residues are indicated with a dot (.). Strictly conserved residues are highlighted in red with white letter and highly conserved residues are displayed by red letters. GRSV N GenBank accession number is AF251271 and LACV N UniProt accession code is P04873 The Author(s) BMC Bioinformatics 2016, 17(Suppl 18):489 Page 13 of 86 Fig. 2 Monomeric and tetrameric structure of the Groundnut ringspot virus (GRSV) nucleoprotein (N). a Cartoon representation of monomeric GRSV N with rainbow coloring from N- (blue) to C-terminus (red). b Electrostatic surface of the GRSV N with a positively charged groove in complex with RNA shown as yellow (carbons) and red (oxygens) sticks. Positive and negative charges are blue and red, respectively. c N-terminus interaction surface representation of four GRSV N monomers A, B, C, D shown in color pink, yellow, cyan and green, respectively. d C-terminus interaction surface representation of the GRSV N tetrameric ring. The RNA is shown in black sticks deeply bound inside the tetrameric ring. e Cartoon representation with the N-arm oligomerization interface showing interacting residues. The N-terminal arm is in pink and the globular region is in green. The intermolecular hydrogen bonds are shown as yellow dotted lines. f 180° rotation of Fig. 2e, C-arm oligomerization interface showing interacting residues. The C-terminal arm is in cyan and the globular region is in green interactions have been listed in Table 1 for the two independent interfaces. According to PISA, the intermolecular interactions were mainly hydrogen bonds, but van der Waals and hydrophobic interactions also contribute to hold the monomers together (data not shown). This interaction model is further corroborated by the available mutational studies on TSWV N [4, 17, 18]. Actually, the first assay to map functional domains of TSWV N, performing Y2HS and random serial deletions, showed that both the N- (1–39 aa) and C-terminals (233– 248 aa) were important for N-N interaction [18], in clear agreement with the structural results presented here. Furthermore, [17] identified three crucial intermonomer binding regions: 42–56, 132–152 and 222–248 which have a clear correspondence with the predicted interaction residues of GRSV N located at N- and C-arms, or buried in the core of the model (Fig. 3). Moreover, amino acids residues located at the regions K103-A119 and L132-V135 are solvent accessible and therefore are able to interact with NSm, glycoproteins, viral polymerase or host proteins [6, 7]. Recently, studies have been performed attempting to identify N-NSm interactions [28, 29] which results are in perfect congruence with the GRSV N protein model. In both cases, the model The Author(s) BMC Bioinformatics 2016, 17(Suppl 18):489 Page 14 of 86 Fig. 3 Sequence alignment of representative tospoviruses Nucleoproteins (N). The secondary structure of Groundnut ringspot virus (GRSV) is shown above and of La crosse virus (LACV) is shown at the bottom. Key residues for GRSV N ssRNA binding are marked with yellow triangles. GRSV N- and C-arms are marked with blue and green boxes respectively, with key residues for oligomerization highlighted. Strictly conserved residues are highlighted in red with white letter and highly conserved residues with red letter. I: Tospovirus American clade I; II: Tospovirus American clade II; III: Tospovirus Eurasian clade; IV: Orthobunyavirus. The sequence codes are supplied at the Additional file 2: Table S1 proposed here represents an efficient tool to assist in planning experiments with mutations and deletion in the N protein. In addition, the obtained model for N protein was submitted to molecular dynamics simulations in order to both refine the structure in aqueous solvent [30, 31] and access the protein conformational ensemble, further exploring its structural and functional roles. During the simulation time, the globular core domain did not reveal any loss of secondary structure, increase of radius of gyration or persistent increments on RMSD values, which supports the model quality. It is worthy to mention that RMSF calculations indicate the N-terminal arm (1–25 aa) as a very flexible region (Fig. 4c). Table 1 Pairs of interacting residues for GRSV N-N oligomerization N-arma N-arm binding siteb C-armc C-arm binding siteb S2d S83 A226 K183 V4 N82 S229 K183 T7 Q61 D233 T186 T7 S62 Y235 K183 K8 S62 N238 K198 N10 T73 Y243 N185 V12 T73 V246 K175 V12 G75 V248 K173 K249 Y174 aN-arm amino acids residues of GRSV N bInteracting amino acids residues of GRSV N globular core domain cC-arm amino acids residues of GRSV N dAmino acids residues position in the GRSV N sequence RNA interaction According to the GRSV N protein model, the RNA is primarily bound at the central RNA-binding groove (Fig. 2b), and the key residues for this interaction (K3, K5, Q17, K58, R60, Q61, R94, R95, K183, Y184, K187, K192 and K227) are mainly located in this positively charged groove. This positively charged groove is only possible because residues F37, F56, F72, F74, I79, M91, F93 and L96 form a hydrophobic core, which is indispensable to stabilize the protein folding and to correctly orient the RNA interacting residues towards the groove. Importantly, these residues are highly conserved among all tospoviruses (Fig. 3). Note that the N-terminal arm is also involved in RNA binding and shielding RNA from the solvent (Fig. 2c-d). Residues F23, L54, F56, L57 and F93 were observed to modulate the RNA nucleobases dynamics during the performed simulation, The Author(s) BMC Bioinformatics 2016, 17(Suppl 18):489 Page 15 of 86 Fig. 4 Molecular Dynamics of monomeric Nucleoprotein (N) of Groundnut ringspot virus (GRSV). Root Mean Square Deviation (RMSD) calculations for different set of atoms in both presence a and absence b of RNA. c Root Mean Square Fluctuations (RMSF) calculations for the entire N protein in both presence (red) and absence (black) of RNA. d Plot of α-helix content as function of time in both presence (red) and absence (black) of RNA while the N-terminal arm seems to play a stabilization role during MD simulations of GRSV N protein (Fig. 4a and b). In addition, the content of alpha-helices in GRSV N protein bound to RNA increased 25 % during the simulation in comparison to the free monomer (Fig. 4d), suggesting that, in the simulated timescale, the monomeric state does not present a lack of conformational stability in detriment of oligomeric states, as observed experimentally for other viruses [32, 33]. Recently, the residues R60, R94, and R95 were confirmed to interact with RNA [33], which also supports our results. RNA is strongly bent at each N-N interface and is largely solvent-inaccessible in the tetramer (Fig. 2d). The dimensions of the groove can accommodate ssRNA and PISA analysis showed that the majority of residue-nucleotide interactions occur with the ribose and the phosphate moieties, suggesting a nonsequence-specific RNA interaction. Indeed, Richmond et al. [4] carried out mutagenesis and gel shift assay studies to identify N regions important for ssRNA binding and demonstrated that the N-RNA complex is highly stable and non-sequence-specific, further supporting these results. Conclusions Taken together, these data will help designing further and more accurate mutational and functional studies of tospovirus N proteins. In addition, the proposed model may shed light on the mechanisms of RNP shaping and could allow the identification of essential amino acid residues as potential targets for tospovirus control strategies. Methods In silico homology modeling and model optimization A template for modeling the GRSV N protein was searched in expasy SWISS-MODEL server [34] using the amino acid sequence of GRSV N as a reference. The Author(s) BMC Bioinformatics 2016, 17(Suppl 18):489 Page 16 of 86 Template crystal structures of Orthobunyavirus genus were chosen due to their genetic relationship. The LACV (La Crosse virus-Orthobunyavirus) N tetrameric crystal structure in complex with ssRNA (PDB ID 4BHH) was selected as the template [20], aligned with GRSV N using T-Coffee server [35] and the resulting alignment was manually improved using BioEdit [36]. Aligned sequences were used with MODELLERv9.10 [37] to develop high quality tetrameric models along with or without RNA. Optimization of the models was achieved using energy minimization protocols available at Yasara [38] and Chiron [39] servers. Quality of the 3D models were evaluated with ERRAT (version 2.0) [40] and MOL probity [41]. Ramachandran plots for the models were assessed and Ramachandran outlier residues were fixed with COOT [42] and energy minimization. The highest quality model with 90.1 % residues in favored region and 8.4 % in allowed region while 1.5 % outlier at Ramachandran plot was selected after visual inspection (see Additional file 1: Figure S1). The model was subjected to the PISA program [43] for interface analysis at EBI-EMBL server and the retrieved PISA data was analyzed for binding patterns using PyMOL [44]. Molecular dynamics Molecular dynamics techniques were applied using GROMACS suite [45] in order to evaluate the stability and consistency of the obtained N protein monomeric model and investigate GRSV N protein-RNA interactions over time. Therefore, N protein model was simulated in the presence and absence of the modeled RNA, in two analytical systems. Amber99SB-ILDN force field [46] was used to generate proper topologies. The models were placed at the center of a dodecahedral box and solvated with TIP3P water model [47]. Counterions were used to neutralize the net charge of the system, and 0.15 M of NaCl was added to the box in order to simulate cellular ionic environment. After a minimization protocol using steepest descent and conjugate gradient to eliminate possible clashes and bad contacts, NVT ensemble with restraint forces of 1000 kJ/mol was carried for 4 ns at 300 K. Moreover, five subsequent equilibration steps in NPT ensemble were carried out at 1 bar with restraint forces of 800 kJ/mol on heavy atoms, 600 Kcal/(mol x nm) and 400 kJ/mol on mainchain, 200 kJ/mol on backbone and 100 kJ/mol on alpha-carbons, totalizing 13 ns. Finally, production runs with no restraints were carried for 50 ns using an integration step of 2 fs and LINCS algorithm [48]. Also, Particle Mesh Ewald method [49] was applied for Coulombic and Lennard-Jones interactions longer than 1 nm. Additional files Additional file 1: Ramachandran plot analysis of predicted structure of Groundnut ringspot virus (GRSV) N protein. The regions covered by light blue lines show most favored regions, while the regions covered by dark blue lines show allowed regions. Other regions of the plot show the disallowed region. The pink dots show the outliers (PNG 87 kb) Additional file 2: The Genbank acession numbers of the viruses used at this work (TABLEDOCX 19 kb) Declarations This article has been published as part of BMC Bioinformatics Volume 17 Supplement 18, 2016. Proceedings of X-meeting 2015: 11th International Conference of the AB3C + Brazilian Symposium on Bioinformatics: bioinformatics. The full contents of the supplement are available online https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-18. Funding Publication of this paper has been funded by CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), and FAPDF (Fundação de Apoio à Pesquisa do Distrito Federal), Brazil. Availability of data and materials The datasets supporting the conclusions of this article are included within the article and its supplementary files. Authors’ contributions Conceived and designed the experiments: JARGB MF RNL. Performed the homology modeling: MF RNL. Performed the Molecular Dynamics: HV MDP. Analyzed the data: FLM HV JARGB MDP MF RNL ROR. Wrote the paper: FLM HV JARGB MDP MF RNL ROR. All authors read and approved the final manuscript. Competing interests The authors declare that they have no competing interests. Consent for publication Not applicable. Ethics approval and consent to participate Not applicable. Author details 1Laboratório de Virologia Vegetal, Departamento de Biologia Celular, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil. 2Laboratório de Biofísica, Departamento de Biologia Celular, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil. 3Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil. 4Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil. Published: 15 December 2016 References 1. Walter CT, Barr JN. Recent advances in the molecular and cellular biology of bunyaviruses. J Gen Virol. 2011;92(Pt 11):2467–84. 2. de Avila AC, de Haan P, Kormelink R, Resende Rde O, Goldbach RW, Peters D. Classification of tospoviruses based on phylogeny of nucleoprotein gene sequences. J Gen Virol. 1993;74(Pt 2):153–9. 3. Pappu HR, Jones RA, Jain RK. Global status of tospovirus epidemics in diverse cropping systems: successes achieved and challenges ahead. Virus Res. 2009;141(2):219–36. 4. Richmond KE, Chenault K, Sherwood JL, German TL. Characterization of the nucleic acid binding properties of tomato spotted wilt virus nucleocapsid protein. Virology. 1998;248(1):6–11. 5. Ribeiro D, Borst JW, Goldbach R, Kormelink R. Tomato spotted wilt virus nucleocapsid protein interacts with both viral glycoproteins Gn and Gc in planta. Virology. 2009;383(1):121–30. 6. Soellick T, Uhrig JF, Bucher GL, Kellmann JW, Schreier PH. The movement protein NSm of tomato spotted wilt tospovirus (TSWV): RNA binding, The Author(s) BMC Bioinformatics 2016, 17(Suppl 18):489 Page 17 of 86 interaction with the TSWV N protein, and identification of interacting plant proteins. Proc Natl Acad Sci U S A. 2000;97(5):2373–8. 7. Feng Z, Chen X, Bao Y, Dong J, Zhang Z, Tao X: Nucleocapsid of Tomato spotted wilt tospovirus forms mobile particles that traffic on an actin/ endoplasmic reticulum network driven by myosin XI-K. New Phytol. 2013; 200(4):1212-24. 8. Ariza A, Tanner SJ, Walter CT, Dent KC, Shepherd DA, Wu W, et al. Nucleocapsid protein structures from orthobunyaviruses reveal insight into ribonucleoprotein architecture and RNA polymerization. Nucleic Acids Res. 2013;41(11):5912–26. 9. Guu TS, Zheng W, Tao YJ. Bunyavirus: structure and replication. Adv Exp Med Biol. 2012;726:245–66. 10. de Oliveira AS, Melo FL, Inoue-Nagata AK, Nagata T, Kitajima EW, Resende RO. Characterization of bean necrotic mosaic virus: a member of a novel evolutionary lineage within the genus tospovirus. PLoS One. 2012;7(6):e38634. 11. Snippe M, Willem Borst J, Goldbach R, Kormelink R. Tomato spotted wilt virus Gc and N proteins interact in vivo. Virology. 2007;357(2):115–23. 12. Mir MA, Panganiban AT. The bunyavirus nucleocapsid protein is an RNA chaperone: possible roles in viral RNA panhandle formation and genome replication. RNA (New York, NY). 2006;12(2):272–82. 13. Mir MA, Panganiban AT. The hantavirus nucleocapsid protein recognizes specific features of the viral RNA panhandle and is altered in conformation upon RNA binding. J Virol. 2005;79(3):1824–35. 14. Brennan B, Welch SR, Elliott RM. The consequences of reconfiguring the ambisense S genome segment of rift valley fever virus on viral replication in mammalian and mosquito cells and for genome packaging. PLoS Pathog. 2014;10(2):e1003922. 15. Li W, Lewandowski DJ, Hilf ME, Adkins S. Identification of domains of the tomato spotted wilt virus NSm protein involved in tubule formation, movement and symptomatology. Virology. 2009;390(1):110–21. 16. Singh P, Indi SS, Savithri HS. Groundnut bud necrosis virus encoded NSm associates with membranes via its C-terminal domain. PLoS One. 2014;9(6): e99370. 17. Kainz M, Hilson P, Sweeney L, Derose E, German TL. Interaction between tomato spotted wilt virus N protein monomers involves nonelectrostatic forces governed by multiple distinct regions in the primary structure. Phytopathology. 2004;94(7):759–65. 18. Uhrig JF, Soellick TR, Minke CJ, Philipp C, Kellmann JW, Schreier PH. Homotypic interaction and multimerization of nucleocapsid protein of tomato spotted wilt tospovirus: identification and characterization of two interacting domains. Proc Natl Acad Sci U S A. 1999;96(1):55–60. 19. Zheng W, Olson J, Vakharia V, Tao YJ. The crystal structure and RNA-binding of an orthomyxovirus nucleoprotein. PLoS Pathog. 2013;9(9):e1003624. 20. Reguera J, Malet H, Weber F, Cusack S. Structural basis for encapsidation of genomic RNA by La Crosse orthobunyavirus nucleoprotein. Proc Natl Acad Sci U S A. 2013;110(18):7246–51. 21. Dong H, Li P, Elliott RM, Dong C. Structure of schmallenberg orthobunyavirus nucleoprotein suggests a novel mechanism of genome encapsidation. J Virol. 2013;87(10):5593–601. 22. Niu F, Shaw N, Wang YE, Jiao L, Ding W, Li X, et al. Structure of the leanyer orthobunyavirus nucleoprotein-RNA complex reveals unique architecture for RNA encapsidation. Proc Natl Acad Sci U S A. 2013;110(22):9054–9. 23. Raymond DD, Piper ME, Gerrard SR, Skiniotis G, Smith JL. Phleboviruses encapsidate their genomes by sequestering RNA bases. Proc Natl Acad Sci U S A. 2012;109(47):19208–13. 24. Carter SD, Surtees R, Walter CT, Ariza A, Bergeron E, Nichol ST, et al. Structure, function, and evolution of the Crimean-Congo hemorrhagic fever virus nucleocapsid protein. J Virol. 2012;86(20):10914–23. 25. Ferron F, Li Z, Danek EI, Luo D, Wong Y, Coutard B, et al. The hexamer structure of rift valley fever virus nucleoprotein suggests a mechanism for its assembly into ribonucleoprotein complexes. PLoS Pathog. 2011;7(5):e1002030. 26. Brunotte L, Kerber R, Shang W, Hauer F, Hass M, Gabriel M, et al. Structure of the Lassa virus nucleoprotein revealed by X-ray crystallography, smallangle X-ray scattering, and electron microscopy. J Biol Chem. 2011;286(44): 38748–56. 27. Reguera J, Cusack S, Kolakofsky D. Segmented negative strand RNA virus nucleoprotein structure. Curr Opin Virol. 2014;5:7–15. 28. Tripathi D, Raikhy G, Pappu HR. Movement and nucleocapsid proteins coded by two tospovirus species interact through multiple binding regions in mixed infections. Virology. 2015;478:137-47. 29. Leastro MO, Pallas V, Resende RO, Sanchez-Navarro JA. The movement proteins (NSm) of distinct tospoviruses peripherally associate with cellular membranes and interact with homologous and heterologous NSm and nucleocapsid proteins. Virology. 2015;478c:39–49. 30. Kairys V, Gilson MK, Fernandes MX. Using protein homology models for structure-based studies: approaches to model refinement. TheScientificWorldJOURNAL. 2006;6:1542–54. 31. Sellers BD, Nilmeier JP, Jacobson MP. Antibodies as a model system for comparative model refinement. Proteins. 2010;78(11):2490–505. 32. Dong H, Li P, Bottcher B, Elliott RM, Dong C. Crystal structure of schmallenberg orthobunyavirus nucleoprotein-RNA complex reveals a novel RNA sequestration mechanism. RNA (New York, NY). 2013;19(8):1129–36. 33. Li J, Feng Z, Wu J, Huang Y, Lu G, Zhu M, et al. Structure and function analysis of nucleocapsid protein of tomato spotted wilt virus interacting with RNA using homology modeling. J Biol Chem. 2015;290(7):3950–61. 34. Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42(Web Server issue): W252–8. 35. Notredame C, Higgins DG, Heringa J. T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302(1):205–17. 36. Hall T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–8. 37. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993;234(3):779–815. 38. Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, et al. Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: four approaches that performed well in CASP8. Proteins. 2009;77 Suppl 9:114–22. 39. Ramachandran S, Kota P, Ding F, Dokholyan NV. Automated minimization of steric clashes in protein structures. Proteins. 2011;79(1):261–70. 40. Colovos C, Yeates TO. Verification of protein structures: patterns of nonbonded atomic interactions. Protein sci pub protein soc. 1993;2(9):1511–9. 41. Chen VB, Arendall 3rd WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010;66(Pt 1):12–21. 42. Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of coot. Acta Crystallogr D Biol Crystallogr. 2010;66(Pt 4):486–501. 43. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372(3):774–97. 44. Delano W. The PyMOL molecular graphics system. San Carlos: DeLano Scientific; 2002. 45. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, et al. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25. 46. Lindorff-Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, Dror RO, et al. Improved side-chain torsion potentials for the amber ff99SB protein force field. Proteins. 2010;78(8):1950–8. 47. Jorgensen WL, Madura JD. Quantum and statistical mechanical studies of liquids. 25. Solvation and conformation of methanol in water. J Am Chem Soc. 1983;105(6):1407–13. 48. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM. LINCS: a linear constraint solver for molecular simulations. J Comput Chem. 1997;18(12):1463–72. 49. Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh ewald method. J Chem Phys. 1995;103(19):8577–93. Submit your next manuscript to BioMed Central and we will help you at every step: • We accept pre-submission inquiries • Our selector tool helps you to find the most relevant journal • We provide round the clock customer support • Convenient online submission • Thorough peer review • Inclusion in PubMed and all major indexing services • Maximum visibility for your research Submit your manuscript at www.biomedcentral.com/submit ANEXO A. Trabalhos desenvolvidos ao longo do doutoramento 210 A.2 Influence of Na+ and Mg2+ ions on RNA structures studied with molecular dynamics simulations A estrutura de ácidos ribonucléicos (RNA) é diretamente dependente da presença de cátions Mg2+ para a devida estabilização de interações químicas e a blindagem eletrônica necessária para minimizar as repulsões eletrônicas do polímero. Esses íons são observáveis apenas em estruturas obtidas por cristalografia de raio-X de alta resolução, o que dificulta o estudo de dinâmica molecular de estruturas de RNAs que não possuem esses íons resolvidos. Nesse trabalho, foram realizadas simulações de dinâmica molecular de 24 estruturas de RNA com variações na condição iônica do meio. Foram selecionadas 12 estruturas helicoidais e 12 estruturas de enovelamento complexo, ambos conjuntos provenientes tanto de cristalografias de raio-X quanto de ensaios de RMN. O objetivo desse estudo é de predizer a posição desses cátions nas estruturas de RNA, além de avaliar o impacto da presença de íons Na+ e Mg2+ em diferentes forças iônicas na estabilidade estrutural dos RNAs. Os resultados obtidos no trabalho sugerem que a presença de Mg2+ conserva melhor a estrutura experimental e reproduzem com relativa acurácia suas localizações experimentais. Ainda, foi possível observar diferentes modos de ligação dos cátions nas estruturas helicoidais e de enovelamento complexo, além de que as energias de quebra de interação RNA-íon (∆G‡) oscilam entre 10 e 26 kJ/mol para diferentes átomos de RNA. As conclusões obtidas aqui demonstram a complexidade em simular RNAs com enovelamentos distintos e o impacto real da escolha - muitas vezes negligenciada - da concentração iônica na construção do sistema em análise. 4872–4882 Nucleic Acids Research, 2018, Vol. 46, No. 10 doi: 10.1093/nar/gky221 Published online 30 April 2018 Influence of Na+ and Mg2+ ions on RNA structures studied with molecular dynamics simulations Nina M. Fischer1, Marcelo D. Poleˆ to1,2, Jakob Steuer1,3 and David van der Spoel1,* 1Uppsala Centre for Computational Chemistry, Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, SE-75124 Uppsala, Sweden, 2Center of Biotechnology, Universidade Federal do Rio Grande do Sul, Bento Gonc¸alves 9500, BR-91500-970 Porto Alegre, Brazil and 3Department of Chemistry, University of Konstanz, Universita¨ tstraße 10, D-78457 Konstanz, Germany Received August 14, 2017; Revised February 16, 2018; Editorial Decision March 09, 2018; Accepted April 23, 2018 ABSTRACT The structure of ribonucleic acid (RNA) polymers is strongly dependent on the presence of, in particular Mg2+ cations to stabilize structural features. Only in high-resolution X-ray crystallography structures can ions be identified reliably. Here, we perform molecular dynamics simulations of 24 RNA structures with varying ion concentrations. Twelve of the structures were helical and the others complex folded. The aim of the study is to predict ion positions but also to evaluate the impact of different types of ions (Na+ or Mg2+) and the ionic strength on structural stability and variations of RNA. As a general conclusion Mg2+ is found to conserve the experimental structure better than Na+ and, where experimental ion positions are available, they can be reproduced with reasonable accuracy. If a large surplus of ions is present the added electrostatic screening makes prediction of binding-sites less reproducible. Distinct differences in ion-binding between helical and complex folded structures are found. The strength of binding ( G‡ for breaking RNA atom-ion interactions) is found to differ between roughly 10 and 26 kJ/mol for the different RNA atoms. Differences in stability between helical and complex folded structures and of the influence of metal ions on either are discussed. INTRODUCTION Positively charged ions play an essential role for the structural stability of RNA molecules. Especially, Mg2+ ions facilitate high structural complexity and folding arrangements that allow RNA molecules to perform various cellular functions (1). Apart from canonical functions assigned to RNA molecules such as being involved in protein synthesis, like messenger (m) or transfer (t) RNA, it is nowadays well established that RNAs act in many other biological processes. RNA molecules are for instance involved in gene regulation (e.g., small nuclear (sn), micro (mi) and small interfering (si) RNAs) and in enzymatic activity (e.g. ribozymes and ribonucleoprotein). In eukarya they also play a role in resistance to pathogenic and parasitic invaders (2,3). Some of these functions depend on the presence of metal ions. Mg2+ ions do not only stabilize specific RNA structures (4), but do also help to recognize binding partners and mediate catalytic processes (5–7). Hammerhead ribozymes are one well-known example that require metal ions to be present both for obtaining the correct threedimensional fold and performing the ribozymes’ function (8–12). The need for positively charged ions in close proximity to RNA molecules is not surprising given their negatively charged backbone. Each RNA nucleotide contains one phosphate group that carries one negative charge. Positive ions shield negative charges on the RNA backbone by reducing repulsive forces, thereby allowing intramolecular interactions and compact RNA-biopolymers. RNA molecules are stabilized internally by hydrogen bonds between nucleotides in the same plane and by base stacking. Positively charged ions can be divided into two main groups: (a) ions that bind to structurally well-defined sites in direct contact with or close to the RNA and (b) ions that form a cloud surrounding the RNA molecule (13). The classification of Mg2+ binding has been further refined (1,14,15): inner-sphere ions that form direct bonds with RNA atoms, outer-sphere ions that bind via a single hydration shell to the RNA, diffuse Mg2+ ions that bind via multiple hydration shells, and free ions where the RNA’s charge has no direct effect on the ions. Monovalent ions can also be part of the first group and can bind sequence- specific to electronegative pockets formed by RNA structures (16– 18); they should not just be considered to be part of a diffuse ionic cloud as described by Manning (19). Folding studies have shown that tRNA thermodynamic stability increases when adding monovalent (in particular Na+ and K+) and divalent (Mg2+) ions (20–23). Mg2+ ions are however the *To whom correspondence should be addressed. Tel: +46 18 471 4205; Fax: +46 018 530396; Email: david.vanderspoel@icm.uu.se C The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 Nucleic Acids Research, 2018, Vol. 46, No. 10 4873 most effective for stabilizing the native structure of RNA molecules (22,24). For compensating negative RNA backbone charges, fewer divalent ions are required compared to the number of monovalent ions. In addition, divalent ions bind stronger and are able to bind several phosphate oxygen atoms at once. Finally, Mg2+ is small compared to other divalent ions and can bind to narrow well-defined pockets within RNA structures (13,25–30). Despite the growing number of experimentally solved RNA structures, the number of structures with wellresolved ion binding sites is still limited (31,32). A nonredundant data set of RNA structures (May 2017) contains in total 1155 structures with a resolution better than 4 A˚ . Just 37% of the RNA structures, solved by X-ray crystallography include Mg2+ and 10% Na+ ion positions. Less than one percent of the structures solved by NMR contain Mg2+ or Na+ ions. An important reason for this is that Mg2+ ions, Na+ ions, and water molecules have the same number of electrons. Thus, it is difficult to distinguish them from one another in electron density maps alone. They can only be assigned unambiguously in very high-resolution X-ray crystal structures (13,33–35). During the last years special NMR protocols have been developed to be able to study metal ion binding to RNA. However, a special sample preparation is needed to be able to detect Mg2+ and Na+ ions (36). Recent studies describe monovalent- and divalent ion binding and the influence of different ions on RNA structures. Molecular dynamics (MD) simulations of have been reported suggesting that many Mg2+ ions are strongly associated with RNA, but not directly bound (15). A review by Lipfert et al. (37) describes in detail the difference between direct binding of ions and longer range (‘ion atmosphere’) association to nucleic acids and how this influences structure and stability of RNA and DNA. In this paper, we apply explicit solvent molecular dynamics simulations on a dataset of twenty-four RNA structures. Therefore, we can compare the implications of applying Na+ or Mg2+ ions on helical and complex folded structures, solved by either X-ray crystallography or NMR techniques. By combining results from different structures and running simulations in triplicate we can statistically distinguish true effects of ions from stochastic fluctuations inherent to MD simulations. Obviously the findings are still dependent on force field quality and force fields for RNA have not been scrutinized (38) to the same extent as those for proteins (39). MATERIALS AND METHODS Dataset of RNA structures A dataset of 24 RNA structures was selected consisting of twelve helical and twelve complex folded ones, (Table 1) from the Protein Data Bank (PDB, http://www.rcsb.org/ pdb/) (40) (Supplementary Figure S1). In both groups six of the twelve structures were obtained by X-ray crystallography and six by NMR experiments. When present in experimental structures, we removed water molecules, ions, or other small molecules to ensure identical starting conditions for all structures in the dataset. In the case of 1D4R (41) we also deleted one separate single RNA strand (chain C) and in 1QC0 (42) the smaller RNA double helix (chain A and B). We define helical structures as those that form one double helix composed of either one or two nucleotide strands. In these structures, there are only very few unpaired nucleotides present, i.e. single nucleotides at the strand’s ends (1D4R (41), 4K31 (43) and 413D (44)). In the case of one nucleotide strand forming a double helix there are unpaired nucleotides in the loop region (1A4D (45), 2LPS (46), 2LV0 (47) and 2L2J (48)). In two structures, one single nucleotide is sticking out of the helical main structure (2LPS (46) and 2QEK (49)). In contrast to helical RNAs, complex folded RNA structures typically have a more globular shape, are often multi-helical RNAs, and may be categorized as, e.g., ribozymes or pseudoknots. Ion selection and parameters MD simulations that intend to mimic cellular conditions should use K+ as monovalent ion, since it is the monovalent ion primarily found inside cells. However, we observed formation of salt crystals of K+ and Cl- ions in test simulations (data not shown). The same issue was previously described by other groups (65) when using K+ instead of Na+ ions. For this reason, we chose Na+ as representative for monovalent ions in most of the simulations. Furthermore, we used for Mg2+ force field parameters refined by Allner et al. (66), that reproduce Mg2+ hydration free energies and exchange rates well. Although it is known that the identity of the counter-ion matters to the composition of the ion cloud around nucleic acids (67) we have only used Cl- ions in this work. Furthermore, we note that other cation force field parameters have been proposed for use in conjunction with nucleic acids (68), however rather than comparing many different ion parameter sets we here focus on comparing different RNA structures. Molecular dynamics simulations RNA topologies were built using the parm99 (69) force field with the GROMACS simulation package (version 4.6.7) (70). First, an in vacuo minimization was carried out. Then, each structure is placed in a rhombic dodecahedron box filled with TIP3P water molecules to reach a ratio of 550 water molecules/nucleotide. During this step four different systems are created (Supplementary Table S1), either with just counterions (CI) or at physiological salt (PS) concentration: • Na+CI, just Na+ counterions, • Na+PS, Na+ counterions plus 0.15 M/l NaCl, • Mg2+CI, just Mg2+ counterions, • Mg2+PS, Mg2+ counterions plus 0.15 M/l NaCl. All ions were placed randomly in the simulation box. We monitored that Mg2+ ions maintain a specific initial distance (>2 A˚ to any RNA atom) to the RNA and that direct interactions did not occur during the minimization and first four ns of the equilibration phase. Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 4874 Nucleic Acids Research, 2018, Vol. 46, No. 10 Table 1. Dataset of 24 RNA structures, twelve X-ray (six helical and six complex folded) and twelve NMR (six helical and six complex folded) structures, taken from the Protein Data Bank (PDB, http://www.rcsb.org/pdb/) (40). The fraction of single nucleotides (Nuc.) is calculated for each structure. Helical structures that have only base pairs have a fraction of 0 Fraction Method/PDB id Classification/system unpaired Nuc. A C G U Mg2+ K+ X-ray 1D4R (41) 1QC0 (42) 2QEK (49) 4K31 (43) 413D (44) 420D (50) X-ray 4B5R (51) 4FEJ (52) 4FRG (53) 4JF2 (54) 4KQY (55) 4P5J (56) NMR 1A4D (45) 2D18 (57) 2KYD (58) 2L2J (48) 2LPS (46) 2LV0 (47) NMR 1YMO (59) 2ADT (60) 2LKR (61) 2MHI (62) 2MTK (63) 2M8K (64) Total Helical Single recognition particle RNA Plasmid copy control related RNA HIV-1 dimerization initiation site rRNA A-site A-form RNA double helix RNA with A(anti)-G(syn) mispairs Complex folded SAM-I riboswitch Guanine riboswitch aptamer Cobalamin riboswitch aptamer Class II preQ1 riboswitch S-box (SAM-I) riboswitch tRNA-like structure Helical Loop D/Loop E arm of 5S rRNA HIV-1 dimerization initiation site A-form RNA double helix R/G stem loop RNA ai5(gamma) group II intron Stem–loop from 23S rRNA Complex folded Telomerase RNA pseudoknot Tetraloop–receptor complex U2/U6 snRNA CR4/5 domain of telomerase RNA Ribozyme’s III-IV-V junction Telomerase RNA pseudoknot Average 0.04 0.00 0.04 0.05 0.04 0.00 0.30 0.31 0.34 0.37 0.34 0.36 0.07 0.00 0.00 0.10 0.12 0.17 0.36 0.31 0.37 0.25 0.36 0.30 54 8 18 22 6 4 38 6 13 13 6 46 14 12 12 8 1 44 8 12 14 10 26 2 8 8 8 32 10 6 8 8 94 28 22 32 12 67 16 17 17 17 84 27 19 22 16 7 76 19 19 18 20 4 119 40 27 33 19 2 83 17 28 22 16 2 41 6 10 17 8 34 8 10 12 4 32 10 6 6 10 42 7 13 13 9 34 8 8 11 7 24 8 4 6 6 47 13 13 9 12 86 22 16 26 22 111 29 22 25 35 53 7 15 17 14 47 7 13 16 11 6 48 12 9 6 21 56 13 14 16 13 3 All systems were minimized once more to eliminate any possible clashes and bad contacts. Subsequently, seven equilibration steps are carried out to provide a careful equilibration protocol. First, an NVT ensemble was conducted for 2 ns using position restraints with a force constant of 1000 kJ/(mol×nm2)) to all heavy atoms. During this step the system was heated up to 300 K. Then, six NPT ensembles are conducted at 1 bar and 300 K for 26 ns in total. The number of restrained RNA atoms and the restraining force constant were gradually reduced while ions were given time to occupy preferred binding sites. Finally, production runs were carried out for 50 ns at 300 K and 1 bar, with no restraints. An integration step of 2 fs was applied and all bonds were constrained using the LINCS (71,72) algorithm. A cutoff of 10 A˚ was used for Lennard–Jones and short-range Coulomb interactions and the particle mesh Ewald (PME) method (73) for long-range electrostatic interactions. Velocity rescaling (74) was used for temperature coupling with a time constant of 0.1 ps in order to ensure correct temperature fluctuations. For simulations at constant pressure we used the Parrinello-Rahman pressure coupling algorithm (75) with a time constant of 2 ps. Each of the four systems was simulated three times in order to ensure statistical significance of our analyses. This resulted in 288 simulations and an overall simulation time of 14.4 ␮s for all production phases. Force field evaluations Since most of our simulations were done using a somewhat old force field, an updated set of force field parameters for nucleic acids was tested, namely parmbsc0 (82) in conjunction with parmOL (83,84) for a subset of four RNA struc- Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 Nucleic Acids Research, 2018, Vol. 46, No. 10 4875 tures. For this subset, we also evaluated the difference between K+ and Na+ counterions. RMSD and ⑀RMSD The root mean square deviation (RMSD) was computed using GROMACS between the in vacuo minimized structure and the snapshots taken every 10 ps during the 50 ns production run (Supplementary Figures S4 and S5). The same was done to obtain the ⑀RMSD values with a method developed by Bottaro et al. (76) (Supplementary Figures S6 and S7). Since each RNA structure was simulated three times in four different ionic conditions the mean and standard deviation of the RMSD values for each replica was determined. The mean RMSD values were subtracted from the individual values for all four systems for each structure. The 18 standardised RMSD/⑀RMSD values in each system Na+PS and Mg2+PS in all four structure groups were used to determine the p-values using a t-test between system Na+PS and system Mg2+PS. Finally, we calculated mean RMSD values over the three replicas and the error for each structure in each of the four ionic conditions. Radial distribution functions The radial distribution functions (RDFs) were determined using GROMACS and trajectories with structures taken every 10 ps of the 50 ns production run. For each of the four systems the RDFs are calculated between each of the RNA base (A-N1, A-N3, A-N6, A-N7, A-N9, G-O6, G-N1, GN2, G-N3, G-N7, G-N9, C-O2, C-N1, C-N3, C-N4, U-O2, U-O4, U-N1 and U-N3), the two phosphate oxygen (O1P, O2P), or sugar oxygen (O2 , O3 , O4 and O5 ) atoms and positively charged ions (Na+ or/and Mg2+) present in the system. In all structures, O1P is the atom that points towards the solvent and O2P (particularly in helical structures) towards the minor groove. The average RDFs are calculated for each RNA atom over the 12 helical and 12 complex folded structures. Free energy of activation The same RNA atoms and positively charged ions as described in the RDF analysis were used to determine the free energy of activation G‡ for contact breaking. To specify the contact distance that is required as input parameter between an ion and a certain RNA atom we used the minimum between the first and second maxima from the corresponding RDF values for each structural replica. Similarly, the minimum between the second and third maxima determines the contact distance for second shell contacts. When no peak could be detected within a certain cutoff distance (3.5 or 6.0 A˚ ) the contact distance for this RNA atom was calculated as the average over all minima of all other RNA atoms in this structure. Ion binding sites in RNA structures The occupancies of Na+ and Mg2+ ions in close proximity of RNA structure were computed using the program MobyWat (80,81). Of the seven RNA structures with experimentally determined ion positions, structures were taken every Figure 1. Standardized mean RMSD (nm) values are plotted against standardized mean ⑀RMSD values for each RNA structure. RMSD and ⑀RMSD values for helical X-ray and NMR as well as for complex folded X-ray and NMR structures are obtained during the production run for each replica and each system. First, an average value is determined over the production run RMSD values for all simulations. Second, a mean value over the three average replica values for each structure and system is calculated resulting in 4 x 24 data points with corresponding standard errors. In a last step, these data points are standardized by the mean over the four data points of each structure for each structure separately. This results in 4 x 24 data points with standard errors for RMSD and ⑀RMSD values, which are plotted against each other. Each RNA structure is represented with a different symbol and the four systems are represented with: Na+CI (green), Na+PS (blue), Mg2+CI (orange), Mg2+PS (red). The p-values are calculated with a t-test between system Na+PS and system Mg2+PS. 500 ps from the equilibration phase and every 250 ps from the production phase trajectory. These structures were superimposed to the experimental RNA structure while only considering atoms with <4 A˚ root mean square fluctuation (RMSF) values. RMSF values are obtained to the RNA structure closest to the average structure of the second half of the production run. The input parameters for MobyWat that differ to the default parameters are the following: the maximum and minimum distance limits were set to 6.0 and 1.0 A˚ and the clustering tolerance to 1.5 A˚ . The results are based on the MER clustering algorithm that yield the best results comparing experimental- and predicted ion binding sites. The top 50 predicted ion binding sites were used and RMSD values with respect to the experimental ones calculated for each of them. The predicted ion binding site with the smallest RMSD to an experimental ion binding site was considered as a potential binding site. RESULTS Structural changes Figure 1 illustrates how the surrounding environment, espe- Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 4876 Nucleic Acids Research, 2018, Vol. 46, No. 10 cially the presence or absence of Mg2+ ions, influences RNA structural changes during MD simulations. In agreement with other studies, root mean square deviation (RMSD) values are lower for RNA structures simulated with Mg2+ ions than without. In addition to RMSD values, we calculated ⑀RMSD values (76). This metric discriminates effectively between structurally and kinetically different RNA conformations. It directly describes variations in base-base interactions and therefore captures whether or not important structural characteristics, like base pairs, are preserved during the simulation. Bottaro et al. (76) showed that multiple different secondary RNA structures can be found within 4 A˚ RMSD of each other. Such two RNA structures with low RMSD values to a reference structure do not necessarily have the same secondary structures. Indeed, the base-base interactions could be completely lost in one structure and not in the other. This kind of structural differences is described by ⑀RMSD values that takes structural information about base-pairing into account. An ⑀RMSD of <0.8 indicates all base-base contacts are close to the native experimental structure and an ⑀RMSD of >1 suggests non-native basebase contacts occur in the structure (76). There are more structures with an average (over three replicas) ⑀RMSD >1 for systems simulated without Mg2+ than systems simulated with Mg2+ ions (Supplementary Table S2). The six complex folded NMR structures have almost always ⑀RMSD values >1 regardless of the surrounding ionic environment, except for one structure (PDB id: 2ADT) and another structure (PDB id: 2MTK), when simulated with Mg2+ and NaCl. None of the structures obtained by X-ray crystallography have average ⑀RMSD values >1 when Mg2+ ions were present during the simulation. In general, structures simulated with Mg2+ ions have lower RMSD and ⑀RMSD values compared to structures simulated without Mg2+ ions. We performed a statistical (kernel density) comparison test that compares the distributions of two-dimensional data points. It returns a p-value that is higher for better fits between the two distributions. Our null hypothesis is that the distributions are independent of whether Mg2+ ions are present in the simulations. When thus comparing simulations with a 0.15 M/l NaCl salt concentration with and without Mg2+ ions, the P-values for X-ray structures are both less than 0.001 (Figure 1 for individual P-values). This indicates that Mg2+ is significantly responsible for maintaining native base-base contacts during the simulations, at least for X-ray RNA structures. The P-values for NMR structures are higher and therefore statistically not significant. Ion binding In order to analyze where ions are located during the simulations, radial distribution functions (RDFs) were derived for positively charged ions present during the simulation and all RNA atoms. Direct contacts are identified between Na+ and 16 RNA atoms (O1P, O2P, sugar oxygen atoms, A-N1, A-N3, A-N7, G-O6, G-N7, C-O2, C-N3, U-O2 and U-O4). Mg2+ ions form direct contacts in the simulations only twice with one of the RNA atoms. In one case Mg2+ binds directly to an O1P atom and in another structure to Figure 2. Average radial distribution functions (RDFs) for Na+ and Mg2+ present in the simulations and seven RNA atoms (O1P, O2P, A N7, G O6, G N7, U O2, U O4). The average RDFs for 12 helical and 12 complex folded structures are colored according for each system: Na+CI (green), Na+PS (blue), Mg2+CI (orange) and in system Mg2+PS there are Mg2+ (red) and Na+ (purple). a cytosine oxygen (C-O2). All other Mg2+ interactions with RNA occur indirectly via water molecules. Clearly recognisable first and second shell contacts between positively charged ions and RNA atoms can be determined for seven RNA atoms in RDFs (Figure 2). The RDF peaks are higher for Na+ ions in system Na+CI compared to Na+ ions in system Na+PS and Mg2+PS and for Mg2+ ions in system Mg2+CI compared to Mg2+ ions in system Mg2+PS. In both systems (Na+CI and Mg2+CI) fewer positively charged ions are therefore present in the bulk water surrounding the RNA molecule. We use the same arbitrary definition of bulk water/ions (distance >20 A˚ to any RNA atom) as described by Hayes et al. (15). In system Mg2+PS when both Na+ and Mg2+ ions are present more Mg2+ ions are found in the bulk solvent than in system Mg2+CI. Nevertheless, fewer Na+ ions are found close to the RNA in system Mg2+PS compared to system Na+PS and also system Na+CI. When comparing helical versus complex folded RDFs, fewer Na+ ions are in direct contact with helical phosphate oxygen atoms compared to complex folded ones. In both structure groups there is a preference Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 Nucleic Acids Research, 2018, Vol. 46, No. 10 4877 for O2P over O1P for Na+ as well as Mg2+ ions. For the nitrogen atom in adenine (A-N7) we observe only differences of the RDFs of Na+ ions in the second and forth sys- tem between helical and complex folded structures. For AN7 there seems to be a preference for Na+ first shell binding compared to Mg2+ second shell binding interactions. When comparing Na+ direct binding between helical and complex folded structures for both favoured guanine atoms (G-O6 and G-N7), there are more occurrences in complex folded structures with a slight preference for G N7. This is in contrast to second shell Mg2+ interactions, where G- O6 is the preferred atom. The peak for helical G-O6 atoms and Mg2+ ions in system Mg2+PS is the highest determined for all RDFs. For U-O2 atoms very few ion contacts were found, and most of them are found in complex folded struc- tures. Since helical structures mostly form Watson-Crick base pairs U-O2 atoms lie in the minor groove and it is likely therefore they not easily accessible to ions (Figure2). The other uracil oxygen atom (U-O4) is slightly preferred by Na+ ions in first shell interactions in complex folded structures and by Mg2+ ions in second shell binding for helical structures. Ion binding energetics An analysis method that was developed to study kinetics of hydrogen bond breaking and forming (77) and thermodynamics of hydrogen bond breaking in different environments (78) was used here for studying ion-binding energetics. This method yields the Gibbs energy of activation G‡ for contact breaking and was previously applied on RNAion contacts in a study of viral RNA (79). The highest energy for breaking first shell contacts was found between one phosphate oxygen atom (O1P) and a Mg2+ ion (Figure 3). This is the result of one of the two direct interactions between a Mg2+ ion and an RNA atom that occurred in the simulations, as also observed in the RDF analysis. The energy of first shell contacts between Na+ ions and RNA phosphate oxygen atoms (O1P and O2P) is not as high compared to other RNA atoms (A-N7, G-O6, G-N7, C-O2, CN3, U-O2 and U-O4). These results differ from the RDF results insofar that the RDF peaks for C-O2, C-N3, and UO2 atoms are very low especially compared to the peaks of phosphate oxygen atoms. The main difference between helical and complex folded first shell contacts is for atoms that are only available for interactions in complex folded structures (A-N3, A-N9, G-N9, U-N1). A-N3 lies in the minor groove in helical structures and the other atoms are the base atoms that are closest to the sugar ring. Therefore, they are not easily accessible to ions in helical RNA structures. Ion binding positions To investigate whether Mg2+ ions find experimentally identified binding sites, when initially placed randomly in the solvent (with a distance >2 A˚ to any RNA atom), we determined the occupancy of Na+ and Mg2+ ions during the simulation using the software MobyWat (80,81). Figure 4 shows the top 10 predicted binding sites for Mg2+ and Na+ ions for one of the three replicas of each system during the equilibration phase superimposed on the Figure 3. Gibbs energy of activation for contact breaking between RNA atoms and ions. The average energy values for 12 helical (A, C) and 12 complex folded (B, D) structures are determined for first shell interactions (direct bonds, A, B) and second shell interactions (C, D). Second shell energies are the sum of first and second shell interactions. The colors represent the corresponding ion in each of the four systems: Na+CI (green), Na+PS (blue), Mg2+CI (orange) and in system Mg2+PS Mg2+ (red) and Na+ (purple). X-ray structure of 2QEK (49). We chose 2QEK as example structure, because both monovalent (K+) and divalent (Mg2+) ions are present in this structure. When only Na+ ions are present in the simulation both K+ and Mg2+ binding sites are occupied (Figure 4A and B). This can also be observed for other structures (Table 2). In some cases it seems as if the binding site can be occupied by both Na+ and Mg2+ ions. Mg2+ binding sites are more difficult to predict with MD simulations since the hydration layer around Mg2+ ions is almost never dismantled. When only Mg2+ are present in the simulation, ions are closer to RNA atoms compared to when both Mg2+ and Na+ are present. The closest distance between experimentally pre- Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 4878 Nucleic Acids Research, 2018, Vol. 46, No. 10 Table 2. Average RMSD values between experimental and predicted binding sites during the production phase. The position of ions are predicted with MobyWat (80,81). The resulting top 50 ion positions are considered for each replica System Ion 1D4R MG-90 MG-91 2MTK MG-48* MG-49 MG-50 MG-51 MG-52 MG-53 2QEK K-47 K-48 MG-49 K-50 4FRG MG-179 MG-180 MG-181 MG-182* MG-183 MG-184 MG-185 4JF2 MG-94 MG-95* MG-96 MG-97* 4KQY MG-121 MG-122* 4P5J MG-85 MG-86 Na+ CI Na+ 2.8 ± 1.7 6.0 ± 1.0 7.5 ± 0.9 5.2 ± 1.2 4.2 ± 1.0 2.6 ± 0.7 5.9 ± 0.4 3.3 ± 1.2 3.3 ± 1.2 3.3 ± 0.7 3.6 ± 0.6 1.6 ± 0.7 2.2 ± 0.8 1.5 ± 0.5 2.5 ± 1.0 8.0 ± 0.6 5.7 ± 1.2 1.5 ± 0.6 3.2 ± 1.3 1.4 ± 0.7 7.0 ± 0.7 1.8 ± 0.1 26.6 ± 7.7 1.8 ± 0.5 3.4 ± 1.9 2.3 ± 0.4 2.2 ± 0.4 Na+ PS Na+ 2.4 ± 1.0 6.9 ± 2.1 7.7 ± 0.8 4.3 ± 1.2 4.6 ± 1.7 1.7 ± 1.0 4.1 ± 1.3 3.1 ± 1.5 2.7 ± 0.3 3.4 ± 0.5 2.4 ± 1.1 2.1 ± 0.6 2.2 ± 0.3 1.4 ± 0.2 4.2 ± 2.8 9.1 ± 2.3 4.7 ± 1.2 1.6 ± 0.6 3.4 ± 0.7 1.5 ± 0.4 6.5 ± 1.8 1.6 ± 0.8 31.1 ± 6.3 1.6 ± 0.7 5.0 ± 0.4 2.9 ± 0.6 2.5 ± 0.2 Mg2+ CI Mg2+ Mg2+ PS Na+ Mg2+ 1.0 ± 0.5 5.0 ± 0.7 7.4 ± 3.2 3.9 ± 0.9 6.7 ± 3.0 3.2 ± 0.8 3.6 ± 0.5 2.1 ± 0.4 2.1 ± 0.6 3.4 ± 0.1 2.5 ± 0.2 2.0 ± 0.7 2.4 ± 0.7 2.4 ± 0.8 2.8 ± 0.5 7.6 ± 0.5 3.7 ± 1.5 1.1 ± 0.3 3.7 ± 1.3 2.2 ± 1.1 3.2 ± 0.6 2.5 ± 0.8 18.3 ± 2.7 1.8 ± 0.4 1.8 ± 0.5 1.1 ± 0.7 2.5 ± 0.5 4.2 ± 1.2 1.1 ± 0.5 5.6 ± 1.5 4.4 ± 0.7 8.1 ± 4.0 5.6 ± 3.0 4.7 ± 1.4 3.9 ± 1.3 3.9 ± 2.5 3.2 ± 0.5 5.8 ± 1.9 2.9 ± 1.6 5.7 ± 2.3 3.5 ± 0.4 3.8 ± 2.4 2.3 ± 1.1 2.8 ± 2.4 3.7 ± 2.4 2.9 ± 0.8 4.4 ± 3.7 1.8 ± 0.5 5.2 ± 2.4 2.5 ± 1.3 2.5 ± 0.4 1.6 ± 0.2 1.7 ± 0.2 8.5 ± 0.8 11.5 ± 3.6 3.5 ± 1.7 3.5 ± 1.4 2.1 ± 0.3 4.4 ± 0.8 5.3 ± 0.4 1.4 ± 0.5 7.0 ± 0.6 4.7 ± 2.9 2.0 ± 1.5 5.9 ± 1.4 2.5 ± 1.7 4.7 ± 1.8 2.6 ± 0.3 24.9 ± 6.1 2.7 ± 1.0 4.6 ± 0.7 2.9 ± 0.9 20.6 ± 0.8 0.8 ± 0.6 3.4 ± 0.9 8.4 ± 2.3 4.2 ± 2.0 3.1 ± 1.4 1.8 ± 0.2 5.3 ± 3.2 3.5 ± 2.1 Figure 4. Experimental and predicted ion binding sites. The helical structure (PDB id: 2QEK) has one Mg2+ (solid green) and three K+ (solid purple) binding sites. During the equilibration phase we observe Mg2+ and Na+ in close proximity to the experimentally predicted binding sites. The RMSD between experimental and predicted binding sites are given in A˚ . The 10 top ranked ion binding sites predicted with MobyWat for one replica for each of the four systems is shown: (A) Na+CI (green), (B) Na+PS (blue), (C) Mg2+CI (orange), (D) in Mg2+PS there are Mg2+ (red) and Na+ (purple). dicted Mg2+ binding sites and those observed in our simulation is 1-2 A˚ . Overall, Mg2+ binding sites are predicted better than K+ binding sites. This indicates a preference of Mg2+ ions to experimentally predicted Mg2+ ion binding sites. When both, Na+ and Mg2+ ions are present in the simulations the distances to experimentally predicted binding sites are higher compared to other systems. This is surprising since it does not correlate with lower RMSD or ⑀RMSD values for those structures. It indicates that although specific ion positions are not found during MD simulations the overall structure maintains a native-like fold, poten- tially due to there being a ‘sufficient’ amount of screening of electrostatic interactions. In general we observe Mg2+ ions present along the minor groove of the RNA and in some specific binding sites. The predicted binding sites are in good agreement with experimentally identified ion locations in close proximity to the RNA (Table 2). There are some cases for which the experimental binding site was not detected, however, in particular for ions directly bound to RNA. This is expected, due to the high barrier for desolvation of Mg2 + ions (66). Most of these sites are located at the surface of the RNA and only one RNA atom can be identified as potential contact site in experimentally predicted structures. For this reason RMSD values to the experimental binding sites are marked with an asterisk in Table 2 for these ions. DISCUSSION The structural analyses indicate (at least in X-ray structures and most NMR structures) that Mg2+ ions have a stronger stabilizing effect for helical structures than for complex folded structures (Figure 1). Although ⑀RMSD values of complex folded structures, when simulated with and without Mg2+ ions are comparable, they are in general high (above 1), indicating that these structures do not maintain their native fold (76). This might be due to the fact that the quality of complex folded NMR structures is not as good as that of X-ray structures (Supplementary Table S1), for instance because structures that are inherently more flexi- Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 Nucleic Acids Research, 2018, Vol. 46, No. 10 4879 ble and difficult to solve by X-ray crystallography instead are solved by NMR techniques. Especially, the RNA backbone seems not to be as well defined for NMR structures based on the validation results of X-ray and NMR structures (Supplementary Table S1). It has been reported (38) that helical RNA structures undergo irreversible structural changes in longer MD simulations (over 50 ns) when using parm99 and parmbsc0 (82). They change into a ladder–like structure, similar to what we observe in the majority of helical RNA structures with high RMSD values. The reason for this is that the glycoside torsion angle ␹ is shifted from the anti to the high-anti region. A specific force field parameter set, called parmOL (83,84), has been developed to eliminate this artifact. Since we did not use these parameters for most of the simulation, the correct backbone angle of some helical RNA structures was not maintained in this work. We did, however, use the combination of parmOL (83,84) and parmbcs0 (82) parameters specifically developed for RNA, for a subset of our structures. These structures undergo less structural changes and have lower RMSD and ⑀RMSD values compared to structures that were simulated with the same ion conditions (Supplementary Figure S2). However, our observation that Mg2+ results in more stable simulations still holds. The comparison between Na+ and K+ as a counterion (Supplementary Figures S2 versus S3 and Supplementary Figures S8– S11) suggest potassium stabilizes the structures somewhat more than does sodium. Both Na+ and Mg2+ ions bind sequence specific and also to specific binding sites (Figure 4). In both helical and complex folded structures certain RNA atoms are preferred. In complex folded structures atoms are available for binding that are not sterically accessible for ions in helical RNA structures. For example, one of the oxygen atom in uracil (U-O2) is hidden in the minor groove of a helical RNA with classical Watson-Crick base pair interactions. In complex folded structures we find this atom to be more accessible to ions (Figure 2), consistent with findings reported by Kirmizialtin et al. (18). We think it is appropriate to distinguish between adenine and guanine N7 atoms unlike what was done in previous studies (15,18). Doing so reveals that more ions are close to the guanine N7 atom than can be explained based just on accessibility and indeed the distributions are quantitatively different for both atoms (Figure 2). At low ion concentrations a larger fraction of the Na+ and Mg2+ ions are in direct contacts with the RNA in our simulations than at higher concentrations (2). When, however, both Na+ and Mg2+ ions are present, more Mg2+ ions are closer to the RNA (distance less than 10 A˚ ) than Na+. This is in agreement with the ‘ion atmosphere’ as described by Lipfert et al. (37). It seems therefore that the overall salt concentration should be factored in when considering the properties of the ‘ion atmosphere’. Zheng et al. (13) investigated Mg2+ ion binding sites experimentally, in particular the difference between first and second shell binding frequencies. Since we only observe two direct contacts for Mg2+ ions in our simulations we cannot compare our simulations with the first shell contact frequencies derived in that work (13). The main reason for this is that it is very difficult to replace the hydration shell around Mg2+ by direct contacts during explicit MD sim- ulations (66). Although refined Mg2+ ion parameters (64) were used, the activation energy remains slightly higher and the ion–water exchange rate faster than experimental values (66,85). When we compare our Gibbs activation energies for second shell dissociation/binding of Mg2+ ions (Figure 3) to the experimental frequencies reported in (13) we see a pref- erence for the same RNA atoms. The calculations fit the re- sults by Zheng et al. (13) remarkably well. The RNA atoms with the highest experimental frequencies are (starting from the highest): G-O6, G-N7, O2P, U-O4, A-N7, O1P and A- N6 (13). For helical structures the RNA atoms with highest G‡ are for system Mg2+PS (starting from the highest): G-O6, G-N7, C-N4, U-O4, A-N6, A-N7, O2P, and O1P. For complex folded structures for system Mg2+PS (starting from the highest): G-O6, G-N7, U-O4, A-N6, A-N7, C-N4, O2P, U-O2 and O1P. For helical structures the RNA atoms with highest free energies of activation G‡ are for system Mg2+CI (starting from the highest): G-O6, G-N7, C-N4, UO4, A-N6, A-N7, O2P and O1P. For complex folded structures for system Mg2+CI (starting from the highest): G-O6, G-N7, U-O4, A-N6, A-N7, C-N4, O2P, C-O2 and O1P. Al- though the activation energy is higher for O2P than O1P it seems to be underestimated in all simulations compared to the energies calculated for other RNA atoms. The main difference between helical and complex folded structures is that in helical ones the activation energy is higher for C-N4. When we compare the activation energy of the Mg2+ ion directly in contact with O1P (30.0 kJ/mol) to experimentally predicted activation energies G‡ between Mg2+ ions and DNA (53.1–55.7 kJ/mol) (85) it is quantitatively underestimated. After the equilibration phase we could reproduce all ex- perimentally predicted ion binding sites with good accuracy (Table 2, Figure 4). Especially when Na+ or Mg2+ ions are present (system Na+PS and Mg2+PS) without any additional salt concentration the binding sites are reproduced well using the MobyWat (80,81) analysis (Supplementary Table S3). The reason for this is likely that in simulations at low ionic strength the ions are found in close proximity to the RNA. We find, however, that the occupancy of the experimental ion binding sites calculated from the simula- tion is not reproduced with the same accuracy as the po- sitions. A similar study focused on ion-binding to helical DNA was able to reproduce experimental ion-counts quan- titatively (86), possibly because of improved cation force field parameters (67). An interesting study by Lemkul et al. (87) applied grand- canonical Monte Carlo-MD (GCMC-MD) in order to pre- dict ion-binding for four different RNA molecules. Al- though this approach most likely is more suited to find first shell binding locations than the MD approach we used, the use of pure MD allows to deduce time-dependent properties such as G‡ for contact breaking (Figure3). A combination of the two techniques, prediction using GCMC- MD followed by regular MD would therefore yield a more complete picture of binding thermodynamics and kinetics. Nevertheless it seems the quality of binding site predictions is similar in both methods. Ion binding site prediction is in- herently difficult for these systems with long exchange times. Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 4880 Nucleic Acids Research, 2018, Vol. 46, No. 10 It is likely as well that ion binding sites are missed by any structural analysis since ion-binding and conformational flexibility are interdependent. In fact, it is remarkable that Mg2+ ions are predicted so close to experimental binding sites in normal simulation, while they maintain their hydration shell. In comparison to previous studies our dataset contains a large number (24) of structures yielding rigorous results. Binding site positions and kinetics can be studied, and the relative influence of different ions studied. Based on our results (e.g. Figure 1) there is no justification for using Na+ ions rather than Mg2+ ions in RNA simulations, unless, as in this work, the purpose of the study is to investigate the difference in RNA properties due to the ‘ion atomosphere’ (37). Further improvement of force fields for RNA, water and ions remain needed to describe the complex energy landscape formed by these flexible biomolecules. SUPPLEMENTARY DATA Supplementary Data are available at NAR online. FUNDING Swedish research council [2013-5947 to D.S.]; eSSENCE e-Science collaboration [to N.M.F]; Coordination for the Improvement of Higher Education Personnel (CAPES) [to M.D.P.]; Swedish National Infrastructure for Computing (SNIC) at PDC Centre for High Performance Computing (PDC-HPC) [SNIC2016/34-44]. Funding for open access charge: Vetenskapsra˚det, [2013-5947]; faculty funding. Conflict of interest statement. None declared. REFERENCES 1. Draper,D.E. (2008) RNA folding: thermodynamic and molecular descriptions of the roles of ions. Biophys. J., 95, 5489–5495. 2. Aravin,A.A., Hannon,G.J. and Brennecke,J. (2007) The Piwi–piRNA pathway provides an adaptive defense in the transposon arms race. Science, 318, 761–764. 3. Rollins,M.F., Schuman,J.T., Paulus,K., Bukhari,H.S.T. and Wiedenheft,B. (2015) Mechanism of foreign DNA recognition by a CRISPR RNA–guided surveillance complex from Pseudomonas aeruginosa. Nucleic Acids Res., 43, 2216–2222. 4. Misra,V.K. and Draper,D.E. (2001) A thermodynamic framework for Mg2+ binding to RNA. Proc. Natl. Acad. Sci. U.S.A., 98, 12456–12461. 5. Jenner,L., Demeshkina,N., Yusupova,G. and Yusupov,M. (2010) Structural rearrangements of the ribosome at the tRNA proofreading step. Nat. Struct. Mol. Biol., 17, 1072–1078. 6. Bra¨nnvall,M. and Kirsebom,L.A. (2001) Metal ion cooperativity in ribozyme cleavage of RNA. Proc. Natl. Acad. Sci. U. S. A., 98, 12943–12947. 7. Bowman,J.C., Lenz,T.K., Hud,N.V. and Williams,L.D. (2012) Cations in charge: magnesium ions in RNA folding and catalysis. Curr. Opin. Struct. Biol., 22, 262–272. 8. Scott,W.G., Finch,J.T. and Klug,A. (1995) The crystal structure of an all–RNA hammerhead ribozyme: a proposed mechanism for RNA catalytic cleavage. Cell, 81, 991–1002. 9. Sigurdsson,S.T. and Eckstein,F. (1995) Structure–function relationships of hammerhead ribozymes: from understanding to applications. Trends Biotechnol., 13, 286–289. 10. Schnabl,J. and Sigel,R.K.O. (2010) Controlling ribozyme activity by metal ions. Curr. Opin. Chem. Biol., 14, 269–275. 11. Wilson,T.J. and Lilley,D.M. (2015) RNA catalysis–is that it? RNA, 21, 534–537. 12. Lilley,D.M.J. (2017) How RNA acts as a nuclease: some mechanistic comparisons in the nucleolytic ribozymes. Biochem. Soc. Trans., 45, 683–691. 13. Zheng,H., Shabalin,I.G., Handing,K.B., Bujnicki,J.M. and Minor,W. (2015) Magnesium–binding architectures in RNA crystal structures: validation, binding preferences, classification and motif detection. Nucleic Acids Res., 43, 3789–3801. 14. Draper,D.E. (2004) A guide to ions and RNA structure. RNA, 10, 335–343. 15. Hayes,R.L., Noel,J.K., Mohanty,U., Whitford,P.C., Hennelly,S.P., Onuchic,J.N. and Sanbonmatsu,K.Y. (2012) Magnesium fluctuations modulate RNA dynamics in the SAM–I riboswitch. J. Am. Chem. Soc., 134, 12043–12053. 16. Auffinger,P. and Westhof,E. (2000) Water and ion binding around RNA and DNA (C,G) oligomers. J. Mol. Biol., 300, 1113–1131. 17. Auffinger,P. and Westhof,E. (2001) Water and ion binding around r(UpA)12 and d(TpA)12 oligomers–comparison with RNA and DNA (CpG)12 duplexes. J. Mol. Biol., 305, 1057–1072. 18. Kirmizialtin,S. and Elber,R. (2010) Computational exploration of mobile ion distributions around RNA duplex. J. Phys. Chem. B, 114, 8207–8220. 19. Manning,G.S. (1978) The molecular theory of polyelectrolyte solutions with applications to the electrostatic properties of polynucleotides. Q. Rev. Biophys., 11, 179–246. 20. Urbanke,C., Ro¨ mer,R. and Maass,G. (1975) Tertiary structure of tRNAPhe (yeast): kinetics and electrostatic repulsion. Eur. J. Biochem., 55, 439–444. 21. Leroy,J.L., Gue´ron,M., Thomas,G. and Favre,A. (1997) Role of divalent ions in folding of tRNA. Eur. J. Biochem., 74, 567–574. 22. Ro¨ mer,R. and Hach,R. (1975) tRNA conformation and magnesium binding. A study of a yeast phenylalanine–specific tRNA by a fluorescent indicator and differential melting curves. Eur. J. Biochem., 55, 271–284. 23. Ha,B.–Y. and Thirumalai,D. (2003) Bending rigidity of stiff polyelectrolyte chains: a single chain and a bundle of multichains. Macromolecules, 36, 9658–9666. 24. Stein,A. and Crothers,D.M. (1976) Equilibrium binding of magnesium(II) by Escherichia coli tRNAfMet. Biochemistry, 15, 157–160. 25. Klein,D.J., Moore,P.B. and Steitz,T.A. (2004) The contribution of metal ions to the structural stability of the large ribosomal subunit. RNA, 10, 1366–1379. 26. Lippert,B. (2000) Multiplicity of metal ion binding patterns to nucleobases. Coord. Chem. Rev., 200, 487–516. 27. Tinoco,I. and Kieft,J.S. (1997) The ion core in RNA folding. Nat. Struct. Biol., 4, 509–512. 28. Ennifar,E., Yusupov,M., Walter,P., Marquet,R., Ehresmann,B., Ehresmann,C. and Dumas,P. (1999) The crystal structure of the dimerization initiation site of genomic HIV–1 RNA reveals an extended duplex with two adenine bulges. Structure, 7, 1439–1449. 29. Correll,C.C., Freeborn,B., Moore,P.B. and Steitz,T.A. (1997) Metals, motifs, and recognition in the crystal structure of a 5S rRNA domain. Cell, 91, 705–712. 30. Petrov,A.S., Bowman,J.C., Harvey,S.C. and Williams,L.D. (2011) Bidentate RNA–magnesium clamps: on the origin of the special role of magnesium in RNA folding. RNA, 17, 291–297. 31. Cooper,D.R., Porebski,P.J., Chruszcz,M. and Minor,W. (2011) X–ray crystallography: assessment and validation of protein–small molecule complexes for drug discovery. Expert Opin. Drug Discov., 6, 771–782. 32. Pozharski,E., Weichenberger,C.X. and Rupp,B. (2013) Techniques, tools and best practices for ligand electron–density analysis and results from their application to deposited crystal structures. Acta Crystallogr., Sect. D: Biol. Crystallogr., 69, 150–167. 33. Philips,A., Milanowska,K., Lach,G., Boniecki,M., Rother,K. and Bujnicki,J.M. (2012) MetalionRNA: computational predictor of metal–binding sites in RNA structures. Bioinformatics, 28, 198–205. 34. Zheng,H., Chordia,M.D., Cooper,D.R., Chruszcz,M., Mu¨ ller,P., Sheldrick,G.M. and Minor,W. (2014) Validation of metal–binding sites in macromolecular structures with the CheckMyMetal web server. Nat. Protoc., 9, 156–170. 35. Nayal,M. and Di Cera,E. (1996) Valence screening of water in protein crystals reveals potential Na+ binding sites. J. Mol. Biol., 256, 228–234. Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 Nucleic Acids Research, 2018, Vol. 46, No. 10 4881 36. Gonzalez,R.L. Jr and Tinoco,I. Jr (2001) Identification and characterization of metal ion binding sites in RNA. Methods Enzymol., 338, 421–443. 37. Lipfert,J., Doniach,S., Das,R. and Herschlag,D. (2014) Understanding Nucleci Acid–Ion Interactions. Annu. Rev. Biochem., 83, 813–841. 38. Sponer,J., Otyepka,M., Bana´sˇ,P., Re´blova´,K. and Walter,N.G. (2012) Molecular Dynamics Simulations of RNA Molecules. In: Schlick,T (ed). Innovations in Biomolecular Modeling and Simulations: Complete Set. Royal Society of Chemistry, pp. 129–155. 39. Lange,O.F., van der Spoel,D. and de Groot,B.L. (2010) Scrutinizing molecular mechanics force fields on the submicrosecond timescale with NMR data. Biophys. J., 99, 647–655. 40. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. 41. Wild,K., Weichenrieder,O., Leonard,G.A. and Cusack,S. (1999) The 2 A˚ structure of helix 6 of the human signal recognition particle RNA. Structure, 7, 1345–1352. 42. Klosterman,P.S., Shah,S.A. and Steitz,T.A. (1999) Crystal structures of two plasmid copy control related RNA duplexes: An 18 base pair duplex at 1.20 A˚ resolution and a 19 base pair duplex at 1.55 A˚ resolution. Biochemistry, 38, 14784–14792. 43. Shalev,M., Kondo,J., Kopelyanskiy,D., Jaffe,C.L., Adir,N. and Baasov,T. (2013) Identification of the molecular attributes required for aminoglycoside activity against Leishmania. Proc. Natl. Acad. Sci. U.S.A., 110, 13333–13338. 44. Tanaka,Y., Fujii,S., Hiroaki,H., Sakata,T., Tanaka,T., Uesugi,S., Tomita,K. and Kyogoku,Y. (1999) A–form RNA double helix in the single crystal structure of r(UGAGCUUCGGCUC). Nucleic Acids Res., 27, 949–955. 45. Dallas,A. and Moore,P.B. (1997) The loop E–loop D region of Escherichia coli 5S rRNA: the solution structure reveals an unusual loop that may be important for binding ribosomal proteins. Structure, 5, 1639–1653. 46. Henriksen,N.M., Davis,D.R. and Cheatham,T.E. (2012) Molecular dynamics re–refinement of two different small RNA loop structures using the original NMR data suggest a common structure. J. Biomol. NMR, 53, 321–339. 47. Nikonowicz,E.P., Wang,J., Moran,S. and Donarski,J. (2012) Solution structure of Helix–35 stem–loop from E. coli 23S rRNA. Biol. Magn. Res. Data Bank, doi:10.13018/BMR18549. 48. Stefl,R., Oberstrass,F.C., Hood,J.L., Jourdan,M., Zimmermann,M., Skrisovska,L., Maris,C., Peng,L., Hofr,C., Emeson,R.B. and Allain,F.H.–T. (2010) The solution structure of the ADAR2 dsRBM–RNA complex reveals a sequence–specific readout of the minor groove. Cell, 143, 225–237. 49. Freisz,S., Lang,K., Micura,R., Dumas,P. and Ennifar,E. (2008) Binding of aminoglycoside antibiotics to the duplex form of the HIV–1 genomic RNA dimerization initiation site. Angew. Chem., Int. Ed. Engl., 47, 4110–4113. 50. Pan,B., Mitra,S.N. and Sundaralingam,M. (1999) Crystal structure of an RNA 16–mer duplex R(GCAGAGUUAAAUCUGC)2 with nonadjacent G(syn).A+(anti) mispairs. Biochemistry, 38, 2826–2831. 51. Daldrop,P. and Lilley,D.M.J. (2013) The plasticity of a structural motif in RNA: structural polymorphism of a kink turn as a function of its environment. RNA, 19, 357–364. 52. Stoddard,C.D., Widmann,J., Trausch,J.J., Marcano–Velazquez,J.G., Knight,R. and Batey,R.T. (2013) Nucleotides adjacent to the ligand–binding pocket are linked to activity tuning in the purine riboswitch. J. Mol. Biol., 425, 1596–1611. 53. Johnson,J.E., Reyes,F.E., Polaski,J.T. and Batey,R.T. (2012) B12 cofactors directly stabilize an mRNA regulatory switch. Nature, 492, 133–137. 54. Liberman,J.A., Salim,M., Krucinska,J. and Wedekind,J.E. (2013) Structure of a class II preQ1 riboswitch reveals ligand recognition by a new fold. Nat. Chem. Biol., 9, 353–355. 55. Lu,C., Ding,F., Chowdhury,A., Pradhan,V., Tomsic,J., Holmes,W.M., Henkin,T.M. and Ke,A. (2010) SAM recognition and conformational switching mechanism in the Bacillus subtilis yitJ S box/SAM–I riboswitch. J. Mol. Biol., 404, 803–818. 56. Colussi,T.M., Costantino,D.A., Hammond,J.A., Ruehle,G.M., Nix,J.C. and Kieft,J.S. (2014) The structural basis of transfer RNA mimicry and conformational plasticity by a viral RNA. Nature, 511, 366–369. 57. Baba,S., Takahashi,K.–i., Noguchi,S., Takaku,H., Koyanagi,Y., Yamamoto,N. and Kawai,G. (2005) Solution RNA structures of the HIV–1 dimerization initiation site in the kissing–loop and extended–duplex dimers. J. Biochem., 138, 583–592. 58. Tolbert,B.S., Miyazaki,Y., Barton,S., Kinde,B., Starck,P., Singh,R., Bax,A., Case,D.A. and Summers,M.F. (2010) Major groove width variations in RNA structures determined by NMR and impact of 13C residual chemical shift anisotropy and 1H–13C residual dipolar coupling on refinement. J. Biomol. NMR, 47, 205–219. 59. Theimer,C.A., Blois,C.A. and Feigon,J. (2005) Structure of the human telomerase RNA pseudoknot reveals conserved tertiary interactions essential for function. Mol. Cell, 17, 671–682. 60. Davis,J.H., Tonelli,M., Scott,L.G., Jaeger,L., Williamson,J.R. and Butcher,S.E. (2005) RNA helical packing in solution: NMR structure of a 30 kDa GAAA tetraloop–receptor complex. J. Mol. Biol., 351, 371–382. 61. Burke,J.E., Sashital,D.G., Zuo,X., Wang,Y.–X. and Butcher,S.E. (2012) Structure of the yeast U2/U6 snRNA complex. RNA, 18, 673–683. 62. Kim,N.–K., Zhang,Q. and Feigon,J. (2014) Structure and sequence elements of the CR4/5 domain of medaka telomerase RNA important for telomerase function. Nucleic Acids Res., 42, 3395–3408. 63. Bonneau,E. and Legault,P. (2014) Nuclear magnetic resonance structure of the III–IV–V three–way junction from the Varkud satellite ribozyme and identification of magnesium–binding sites using paramagnetic relaxation enhancement. Biochemistry, 53, 6264–6275. 64. Cash,D.D., Cohen–Zontag,O., Kim,N.–K., Shefer,K., Brown,Y., Ulyanov,N.B., Tzfati,Y. and Feigon,J. (2013) Pyrimidine motif triple helix in the Kluyveromyces lactis telomerase RNA pseudoknot is essential for function in vivo. Proc. Natl. Acad. Sci. U.S.A., 110, 10970–10975. 65. Auffinger,P., Cheatham,T.E. and Vaiana,A.C. (2007) Spontaneous formation of KCl aggregates in biomolecular simulations: a force field issue? J. Chem. Theory Comput., 3, 1851–1859. 66. Allne´r,O., Nilsson,L. and Villa,A. (2012) Magnesium ion–water coordination and exchange in biomolecular simulations. J. Chem. Theory Comput., 8, 1493–1502. 67. Gebala,M., Giambasu,G.M., Lipfert,J., Bisaria,N., Bonilla,S., Li,G., York,D.M. and Herschlag,D. (2015) Cation-anion interactions within the nucleic acid ion atmosphere revealed by ion counting. J. Am. Chem. Soc., 137, 14705–14715. 68. Yoo,J. and Aksimentiev,A. (2012) Improved parametrization of Li+, Na+, K+, and Mg2 + ions for all-atom molecular dynamics simulations of nucleic acid systems. J. Phys. Chem. Lett., 3, 45–50. 69. Wang,J., Cieplak,P. and Kollman,P.A. (2000) How well does a restrained electrostatic potential (resp) model perform in calculating conformational energies of organic and biological molecules. J. Comp. Chem., 21, 1049–1074. 70. Pronk,S., Pa´ll,S., Schulz,R., Larsson,P., Bjelkmar,P., Apostolov,R., Shirts,M.R., Smith,J.C., Kasson,P.M., van der Spoel,D. et al. (2013) GROMACS 4.5: a high–throughput and highly parallel open source molecular simulation toolkit. Bioinformatics, 29, 845–854. 71. Hess,B., Bekker,H., Berendsen,H.J.C. and Fraaije,J.G.E.M. (1997) LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem., 18, 1463–1472. 72. Hess,B. (2008) P–LINCS: a parallel linear constraint solver for molecular simulation. J. Chem. Theory Comput., 4, 116–122. 73. Darden,T., Pearlman,D. and Pedersen,L.G. (1998) Ionic charging free energies: Spherical versus periodic boundary conditions. J. Chem. Phys., 109, 10921–10935. 74. Bussi,G., Donadio,D. and Parrinello,M. (2007) Canonical sampling through velocity rescaling. J. Chem. Phys., 126, 014101. 75. Parrinello,M. and Rahman,A. (1981) Polymorphic transitions in single crystals: a new molecular dynamics method. J. Appl. Phys., 52, 7182–7190. 76. Bottaro,S., Di Palma,F. and Bussi,G. (2014) The role of nucleobase interactions in RNA structure and dynamics. Nucleic Acids Res., 42, 13306–13314. 77. Luzar,A. and Chandler,D. (1996) Effect of environment on hydrogen bond dynamics in liquid water. Phys. Rev. Lett., 76, 928–931. Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 4882 Nucleic Acids Research, 2018, Vol. 46, No. 10 78. van der Spoel,D., van Maaren,P.J., Larsson,P. and Tˆımneanu,N. (2006) Thermodynamics of hydrogen bonding in hydrophilic and hydrophobic media. J. Phys. Chem. B., 110, 4393–4398. 79. Larsson,D.S.D. and van der Spoel,D. (2012) Screening for the location of RNA using the chloride Ion distribution in simulations of virus capsids. J. Chem. Theory Comput., 8, 2474–2483. 80. Jeszeno˝ i,N., Horva´th,I., Ba´lint,M., van der Spoel,D. and Hete´nyi,C. (2015) Mobility-based prediction of hydration structures of protein surfaces. Bioinformatics, 31, 1959–1965. 81. Jeszeno˝ i,N., Ba´lint,M., Horva´th,I., van der Spoel,D. and Hete´nyi,C. (2016) Exploration of interfacial hydration networks of target-ligand complexes. J. Chem. Inf. Model, 56, 148–158. 82. Perez,A., Marchan,I., Svozil,D., Sponer,J., Cheatham,T.E. III, Laughton,C.A. and Orozco,M. (2007) Refinement of the AMBER force field for nucleic acids: improving the description of [alpha]/[gamma] conformers. Biophys. J., 92, 3817–3829. 83. Banas,P., Hollas,D., Zgarbova,M., Jurecka,P., Orozco,M., Cheatham,T.E. III, Sponer,J. and Otyepka,M. (2010) Performance of molecular mechanics force fields for RNA simulations. Stability of UUCG and GNRA hairpins. J. Chem. Theory Comput., 6, 3836–3849. 84. Zgarbova,M., Otyepka,M., Sponer,J., Mladek,A., Banas,P., Cheatham,T.E. III and Jurecka,P. (2011) Refinement of the Cornell et al. nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles. J. Chem. Theory Comput., 7, 2886–2902. 85. Cowan,J.A., Huang,H.-W. and Hsu,L.-Y. (1993) Sequence selective coordination of Mg2+ (aq) to DNA. J. Inorg. Biochem., 52, 121–129. 86. Yoo,J. and Aksimentiev,A. (2012) Competitive binding of cations to duplex DNA revealed through molecular dynamics simulations. J. Phys. Chem. B, 116, 12946–12954. 87. Lemkul,J.A., Lakkaraju,S.K. and MacKerell,A.D. (2016) Characterization of Mg2+ distributions around RNA in solution. ACS Omega, 1, 680–688. Downloaded from https://academic.oup.com/nar/article-abstract/46/10/4872/4990003 by guest on 31 May 2018 ANEXO A. Trabalhos desenvolvidos ao longo do doutoramento 222 A.3 Dynamics of Membrane-Embedded Lipid-Linked Oligosaccharides for The Three Domains of Life Oligossacarídeos ligados à lipídeos (LLOs) são substratos de oligossacariltransferases (OSTs), enzimas que catalizam em bloco a transferência de uma cadeia glicada durante o processo de N-glicosilação. LLOs são compostos de uma cadeia isoprenóide e um oligossacarídeo, ligados por um ou mais grupos pirofosfatos (PP). A parte lipídica do LLO em eukaria e archea é o dolicol, enquanto para prokarya é o undecaprenol, mas o número de unidades isoprenóides pode variar entre espécies. Nesse trabalho, os LLOs foram parametrizados utilizando parâmetros de lipídeos já encontrados no campo de força GROMOS53A6, juntamente com os parâmetros do GROMOS53A6GLYC para descrever as porções sacarídicas. Os parâmetros para descrever as ligações torcionais das porções isoprenóides foram derivados de cálculos quânticos do tipo HF/6-31G*, enquanto a topologia final foi validada utilizando propriedades termodinâmicas em fase condensada, como densidade e entalpia de vaporização. Simulações de dinâmica molecular foram realizadas para compreender a estrutura e dinâmica dos LLOs do reino eukaria (Glc3-Man9-GlcNAc2-PP-Dolicol), prokarya (Glc1- GalNAc5Bac1-PP-Undecaprenol) e archaea (Glc1-Man1-Gal1-Man1-Glc1-Gal1-Glc1-P-Dolichol) ancorados à membranas lipídicas. As simulações revelaram que, em geral, as porções sacarídicas interagem com os grupamentos fosfato da membrana, juntamente com os PPs dos LLOs. Ainda, existem similaridades de orientação, estruturas preferenciais e dinâmica ao longo da membrana dos LLOs dos três reinos da vida. As informações obtidas aqui fornecem informações de preferência conformacional dos LLOs para futuros estudos de sua complexação com OSTs, permitindo a investigação da transferência da porção oligossacarídica para a proteína aceptora da glicosilação. Este trabalho se encontra em fase de submissão. Manuscript submitted to Biophysical Journal Article Dynamics of Membrane-Embedded Lipid-Linked Oligosaccharides for The Three Domains of Life Pablo Ricardo Arantes1, Conrado Pedebos1,2, Laércio Pol-Fachin3, Marcelo Depolo Polêto1, and Hugo Verli1,* 1Centro de Biotecnologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil 2School of Pharmacy, University of Nottingham, University Park, Nottingham, U.K. 3Centro Universitário CESMAC, Maceió, AL, Brazil *Correspondence: hverli@cbiot.ufrgs.br ABSTRACT Lipid-linked oligosaccharides (LLOs) are the substrates of oligosaccharyltransferases (OSTs), enzymes that catalyze the en bloc transfer of a glycan chain during the process of N-glycosylation. LLOs are composed by an isoprenoid chain moiety and an oligosaccharide, linked by one or more pyrophosphate group (PP). LLO lipid component is a dolichol in eukarya and archaea, and an undecaprenol in prokarya. Additionally, the number of isoprene units may change between species. To obtain models for LLOs from different domains of life embedded in biological membranes, which are able to describe their 3D structure and dynamics, is an important and required step to develop further studies of their complexation and processing by OSTs. The GROMOS53A6 force field was employed, added by GROMOS53a6GLYC parameters for the saccharidic moiety. The torsional parameters for the isoprenoid portion were derived from a fit to the proper quantum mechanical potential energy profiles at the HF 6-31G* and validated against experimental condensed phase properties. Molecular dynamics simulations employed GROMACS package to access the orientation, structure, and dynamics of eukaryotic (Glc3-Man9-GlcNAc2-PPDolichol), bacterial (Glc1- GalNAc5-Bac1-PP-Undecaprenol) and archaeal (Glc1-Man1-Gal1-Man1-Glc1-Gal1-Glc1-P-Dolichol) LLO in membrane bilayers. The obtained topologies for the isoprenoid group were able to properly reproduce experimental thermodynamic properties, such as enthalpy of vaporization and density. The microsecond molecular dynamics simulations of LLOs revealed that most carbohydrate residues interact with the membrane lipid head groups, while the PP linkages are within the lipid head group, and the isoprenoid chains are within the bilayer. Overall, there are similarities in the orientations, structure, and dynamics of the eukaryotic, bacterial and archaea LLOs in bilayers. The preferred orientation, structure and dynamics of LLOs provided information for complexation with OSTs, allowing further studies of how these enzymes catalyze the the transfer of the oligosaccharide chain to an acceptor protein by OSTs. INTRODUCTION The glycosylation of asparagine residues is the predominant protein modification throughout all three domains of life (1). This post-translational modification is called N-glycosylation, which is important in many aspects of biology and affect various properties of the proteins, including folding, conformation, solubility(1, 2). The N-glycosylation occurs in the consensus motif (referred to as the sequon) represented by Asn-Xaa-Ser/Thr, where Xaa can be any residue except Pro (3, 4). Lipidlinked oligosaccharides (LLOs) are the substrates of oligosaccharyltransferase (OST), which catalyzes the transfer of the oligosaccharide onto the acceptor asparagine of nascent proteins (5). The LLO molecule can be subdivided in two main parts: the isoprenoid chain moiety and the oligosaccharide chain. The link between these two portions is composed by one or two pyrophosphate groups (6). The lipid component of LLOs varies for each species, as well as the number of isoprene units. On eukarya and archaea, the isoprenoid moiety is a dolichol, while, in bacteria, the isoprenoid found is an undecaprenol. In eukarya, the oligosaccharide portion is more conserved in its LLO, composed by a Glc3-Man9-GlcNAc2 chain, while on the other domains of life it becomes more diverse. In spite of all the differences in these molecules structures, the oligosaccharide transfer mechanism seems to be conserved in all domains of life (7). In Bacteria, the N-glycosylation final step is performed by a single-subunit oligosaccharyltransferase (PglB) (8). However, in most eukaryotes, the same is achieved by the action of a large, multi-subunit, membrane-embedded OST complex, usually containing eight subunits in yeast, but possibly showing even more in multicellular organisms. Recently, the structure of the yeast OST was characterized by employing single-particle cryo-EM (9), revealing a conserved subunit arrangement. In this structure, the catalytic STT3 subunit displays full access to its substrates, since its active site is not located in the inner area Manuscript submitted to Biophysical Journal 1 Author1 and Author2 of the complex. The dolichol-pyrophosphate portion of the eukaryotic LLO binds to a hydrophobic groove of Stt3, similarly as other OSTs (10, 11), while the glycan chain binds to the pocket formed in the interface of both domains of the Stt3 unit, also comprised by two noncatalytic subunits and an ordered N-glycan. This implies that these LLO molecules must populate conformations in which the sugar moieties are parallel to the membrane (12). In this context, there is only one previous computational report (12) that assessed this information in the atomic level (only considering the bacterial and the eukaryotic LLOs) which demonstrated the importance of the oligosaccharide chain behavior and its orientation with respect to the bilayer before its binding. In our work, we employed a combination of quantum mechanic (QM) calculations, molecular modelling techniques and molecular dynamics (MD) simulations to study conformational properties, such as the orientation and the dynamics of LLOs from all domains of life: i) an eukaryotic LLO (Glc3-Man9-GlcNAc2-PP- Dolichol; G3M9Gn2-PP-Dol; Fig. 1 A) embedded in 1-Palmitoyl-2-oleoyl-SN-glycero-3-phosphocholine (POPC) membrane; ii) a bacterial LLO (Glc1-GalNAc5Bac1-PP-Undecaprenol; G1Gn5B1- PP-Und; Fig. 1B) embedded in 1-palmitoyl-2-oleoyl-phosphatidylethanolamine (POPE) membrane; iii) an archaeal LLO (Glc1-Man1-Gal1-Man1-Glc1-Gal1-Glc1-P-Dolichol; G1M1Gal1M1G1Gal1G1-P-Dol; Fig. 1C) embedded in a POPE membrane. The eukaryotic, bacterial and archaeal LLOs without its glycan chains were also analyzed. Improvements on the parameters of a torsional angle were achieved, an important correction for a better reproduction of some experimental properties of the isoprenoid unit. In addition, novel parameters for the sulfate groups in the archaeal LLO were also generated. Our microsecond molecular dynamics (MD) simulations analysis demonstrated expected positions for each moiety composing the LLOs, as well as a high motion flexibility for the hydrophobic tails, and parallel orientations for the glycan chains. This allowed us to identify clusters of conformations that were submitted to molecular docking calculations with the available crystallographic structures for all domains of life. We generated enzyme-ligand poses that were in agreement with expected crystal structure distances for all cases, achieving the first set of full complexes OSTs for the eukaryal and the archaeal enzymes. MATERIALS AND METHODS Nomenclature and software The IUPAC proposed recommendations for nomenclature and symbols were used. Regarding MD simulations, the GROMACS 5.0.7 simulation suite(13) was employed, along with the GROMOS 53A6 force field(14) for the systems without the glycan chain and the GROMOS 53A6GLYC force field(15, 16) for glycan chain where X is "2", "3", "4" or "6" has its ϕ and ψ angles defined as shown below: ϕ(1→X) = O5 − C1 − OX − C X (1) ψ(1→X) = C1 − OX − C X − C(X − 1) Finally, for (1→6) linkage, the ω is defined as shown below: (2) ω = O6 − C6 − C5 − C4 For the manipulation and visualization of structures, the softwares VMD and PyMOL were employed. (3) Generation of new torsional parameter for isoprenoid chain and sulfate group The QM torsional profile for the dihedrals within the isoprenoid structure and the sulfate groups were obtained using Gaussian03(? ). For the former, the 3-methylpent-2-ene molecule was used due its similarity to isoprenoid portion, while an entire sulfate group was employed for the latter. These QM calculations were carried out using the scan routine combined with the convergence criterion tight, at HF level with the 6-31G* basis set, obtaining the relative energy associated with the rotation of dihedral by increments of 30◦. Analogue MM calculations were performed in GROMACS 5.0.7, using the force field parameter set 53A6, as described in Pol-Fachin et al.(15, 16) for both, and CHARMM36 (17), for comparison purposes. The QM and MM profiles were fitted in Rotational Profiler server(18), providing proper torsional parameters for MM calculations to yield a torsional profile similar to the QM. These new parameters were then implemented in the LLOs topologies for MD simulations. 2 Manuscript submitted to Biophysical Journal Biophysical Journal Template Figure 1: Schematic representations of the (A) eukaryotic, (B) bacterial and (C) Archaea LLOs used in this study. Parametrization strategy and topology construction In order to describe LLOs through molecular mechanics techniques, a set of lipids commonly found in membranes was selected to act as building blocks in this work. Our parametrization strategy was based on accurately reproducing experimental values for physical-chemical properties of liquids. Topologies were constructed for LLOs using the potentials for bond stretching, bondangle bending, improper dihedral deformation and proper dihedral, as well as van der Waals interactions terms retrieved directly from GROMOS53A6(14) set, while for diphosphate linkage these potentials were retrieved from a previous work (15, 16). Glycan chains topologies were constructed using the potentials for bond stretching, bond-angle bending, and improper dihedral Manuscript submitted to Biophysical Journal 3 Author1 and Author2 deformation, as well as van der Waals interactions terms retrieved directly from GROMOS53A6GLYC force field(15, 16). Such models had their glycosidic linkage geometries adjusted to the main conformational states for each linkage, based on their relative abundance in the isolated disaccharides in water, as previously described (19, 20). Charge adjustments were made to properly reproduce the experimental properties in MM conditions, both for LLOs and the sulfate groups. All MD simulations and analyses were performed using the GROMACS simulation suite(21), version 5.0.7(22). Liquid and gas-phase simulations for assessment of thermodynamic properties Physical-chemical properties of organic liquids (density and enthalpy of vaporization) were used as target to validate topology, as previous works of parametrization of small biomolecules(23–25) and benchmark of force fields(26). The geraniol fragment was chosen considering the availability of experimental values of density and enthalpy of vaporization, and the topology was accepted as useful when the absolute error between experimental and simulated properties was below 15%. In order to calculate thermodynamical properties of organic liquid to validate our topology construction, liquid phase was induced by simulating 125 molecules under 100 bar and scaling the simulation box by 2×2×2. All simulations were carried out with the Berendsen pressure and temperature coupling algorithms (27), using τT = 0.2 ps and τP = 0.5 ps. Experimental values of isothermal compressibility and dielectric constant were used as an additional parameter for liquid simulations when available(14, 28). Otherwise, the compressibility of the most chemically similar molecule was used. Liquid-phase and gas-phase simulations were carried out for 10 ns and 100 ns, respectively. The potential energies associated with these systems (Epot (g) for gas-phase and Epot (l) for liquid-phase) were extracted and used to calculate (Eq. 4) the enthalpy of vaporization (∆Hvap) of the fragments. ∆Hvap = (Epot (g) + kBT ) − Epot (l) (4) Organic liquid densities (ρ) were calculated from liquid-phase simulations using block averages of 5 blocks, as for ∆Hvap. MD simulations were carried out by means of the GROMACS 5.0.7 package, and all the analyses employed dedicated tools from the GROMACS package, associated with in-house scripts to calculate thermodynamic properties. Membrane insertion of LLOs First, the initial LLOs structures were oriented along the Z axis, and for its insertion into membranes, we employed the InflateGRO methodology(29). Briefly, this consists in inserting LLOs into a pre-equilibrated bilayer patch, with lipids overlapping the LLOs, and then expanding the dimensions of the box, as well as, translating all lipids laterally, so that no more overlap is found. After that, a series of minimizations steps occur, along with the box dimensions being compressed and the lipids translated back to the center of the system, until the system reaches the desired density. During all this process, the LLOs are under strong position restraints, so that the structures are not affected. A bilayer model constituted of 120 POPE lipids for bacterial and archaea systems and a bilayer constituted of 120 POPC lipids for eukaryotic, were obtained at the end of the protocol (Figure 2). Figure 2: Initial models of the eukaryotic, bacterial, and archaeal LLOs embedded in membrane bilayers. Water molecules are omitted in this representation. 4 Manuscript submitted to Biophysical Journal Biophysical Journal Template Molecular Dynamics Simulations Following the membrane insertion steps, the rectangular box was then solvated with SPC water model(30) and periodic boundary conditions. Before this process, we did the common procedure of raising the van der Waals radius of the C atom from 0.15 to 0.375, aiming to avoid water molecules filling any left space between the lipids. After solvation, this parameter was returned back to its original value. Counter ions were added to neutralize the systems, when needed. The LINCS algorithm(31) was chosen to constrain covalent bond lengths. This way, an integration step of 2 fs was applied. As for the electrostatic interactions, calculations were performed by the particle mesh Ewald (PME) method(32). The pressure barostat chosen was ParrinelloRahman(33, 34), with a 2.0 ps coupling constant, while the temperature thermostats chosen were V-rescale (NVT step) (35) and NoséHoover (NPT equilibration and production MD) (36, 37), with a coupling constant of τ = 0.5. Additionally, semiisotropic pressure coupling was applied, accounting for the presence of the membrane. Constant temperature of 310 K (Eukarya and Bacterial), 353 K (Archaea) and constant pressure of 1 atm were also implemented. Steepest Descent algorithm was used in the energy minimizations performed. First, two simulations of equilibration were performed with position restraints: an NVT and an NPT of 2 ns and 50 ns, respectively. Subsequently, one microsecond unrestrained NPT MD simulations were performed for each of the six systems, generating the production run where data were collected for the systems analysis. RESULTS AND DISCUSSION Simulation results are presented and discussed for conformational features of the eukaryotic, bacterial and archaeal LLOs in its respective membrane bilayers. The systems without glycan portion are presented for further comparisons. After that, we discuss the molecular docking calculations obtained by employing the LLO models described in this work. Torsional potential and force field calibration While some of the torsional potentials were preserved from the GROMOS 53A6 parameter set for LLOs, the conformation of the isoprenoid chain was re-evaluated based on fitting to QM data due to the so far reported difficulties of 53A6 parameters to reproduce isoprenoid conformational properties. The functional form of the potential energy term, associated with the torsion around dihedral angle m, is given by: Vϕ,m = kϕ,m[1 + cosδmcos(nmϕm)] (5) where ϕm is the dihedral angle value, nm the multiplicity of the term, δm the associated phase shift, and kϕ,m, the corresponding force constant, which are applied. It is worth noting that a given dihedral angle may be involved in more than one torsional potential energy term with different multiplicities and/or phase shifts. Accordingly, the classical energy profiles obtained from 3-methylpent-2-ene dihedral angle rotation were compared to energy profiles obtained from QM calculations, as presented in Figure 3. Analyses of the dihedral angle showed important divergences between QM-calculated and CHARMM36/GROMOS53A6 energy profiles for dihedral angle (Figure 3). Although accounting for two minimum-energy geometries, at -120ř and 120ř, the conformational barriers obtained for the current GROMOS 53A6 parameters are not sufficiently elevated to properly describe the rotation of such a dihedral. Additionally, the 0ř conformer is described as the energy global maximum, instead of a local minimum (Figure 3). The barriers obtained for CHARMM36 force field parameters also show divergences, counting with two minimum at -180ř and 180ř. The -120ř, 0ž and 120ř conformers are described as the energy global maximum, instead of local minima (Figure 3). On the basis of these data, three new torsional dihedral potentials associated with the rotation of the dihedral angle were obtained by fitting the corresponding classical energy profiles to energy profiles obtained from QM calculations (Figure 3). The resulting potentials (Table 1), were shown to adequately reproduce the QM-obtained energy profile related to such torsion. Organic liquid simulations were performed in order to validate the parameters used in topologies for the LLOs building blocks. This strategy of validation using the comparison to experimental thermodynamic properties of condensed-phase (ρ and ∆Hvap) was employed in other works involving the parametrization of molecules in GROMOS force field (23) and for the establishment of force field benchmarks (26). Individually, most of the parametrized molecules obtained values that are in good agreement with experimental data (Table 2) , which indicates a proper set of potentials to reproduce experimental energies. LLOs dynamics and spatial distribution within membrane bilayers The density profiles calculated along the membrane normal (corresponding to the Z axis) indicated that, in general, the saccharidic residues remains above the lipid head group, while the pyrophosphate (PP) group mostly populate the same area as the lipid head groups, and the hydrophobic chain (DolPP, DolP, and UndPP) fluctuates inside the membrane bilayer lipids. We detected a generally asymmetric behavior for this apolar chain, with the highest peaks of density being found in the middle Manuscript submitted to Biophysical Journal 5 Author1 and Author2 Figure 3: Comparison of energy profiles calculated at classical (GROMOS53A6/CHARMM36) and QM (HF/6-31G*) levels in the gas phase. The fitting has generated parameters that reproduce well the QM potential energy in the MM calculations. Table 1: Torsional parameters obtained based on QM calculations Compound δ kϕ,m n 0 5.97 0 3-Methylpent-2-ene 0 -3.67 1 0 -5.17 3 of the two leaflets, along with a higher population of the upper leaflet area. The carbon on the end of the LLO tail highlights the flexibility of this moiety, as it populates distinct areas of the bilayer, with a higher density between the center area of the membrane and the lower leaflet. These results are similar to the LLO without glycans chains (Figure S1). Figure 4: Density profiles for the subdivided parts of the studied LLOs considering the membrane normal (Z axis) of the membrane bilayers: membrane head groups (yellow), lipid acyl chains (black), isoprenoid chain (blue), oligosaccharide (purple), pyrophosphate linkage (red), and the last carbon atom of isoprenoid chain (green). We enhanced the oligosaccharide distribution scaling it by a factor of five to facilitate the observation of data. The same procedure was performed for the pyrophosphate linkage, dolichol chain, and dolichols last carbon distributions, but using a scale factor of ten. To better understand how the LLO tail motions were behaving, we plotted the distance distributions between the first carbon (C1A - all simulations) and the middle carbon (C9E - Eukarya, C6E - Bacteria, and C8E - Archaea) and the first carbon 6 Manuscript submitted to Biophysical Journal Biophysical Journal Template Table 2: Obtained values for thermodynamic properties in the geraniol simulated Compound Geraniol Temp. [K] 298.15(38) Exp. ρ [g/cm3] 0.89(39) Calc. ρ [g/cm3] 0.89 Error 0% Exp. ∆Hvap [k J/mol] 58.83(38) Calc. ∆Hvap [k J/mol] 64.24 Error 9.33% (C1A - all simulations) and the last carbon (C19E - Eukarya, C11E - Bacteria, and C12E - Archaea) of the apolar chain. The distributions found for the eukaryotic and the bacterial LLOs are similar to the ones found on a previous report (12): a more restricted area of occupation for the first half of the chain (Figure 5 - red line), specially on the bacterial molecule, mainly due to its reduced length, while the full length distance distribution displays a pronounced flexibility, with a 10 Ådifference in the stretch of the eukaryotic LLO. Interestingly, the archaeal LLO first half chain shows a pattern of distribution comparable to the eukaryotic LLO, despite having a difference of almost a half of the isoprenoid units. Possibly, the absence of one of the phosphate groups may implicate in an increase of the flexiblity of this moiety. Additionally, the different types of bonds and their spatial organizations in the archaeal hydrophobic chain could also contribute to this behavior. When comparing the complete structures (Figure 5B) with the LLOs without the glycan chains (Figure S2B), the eukaryotic and archaeal LLOs presented significant differences on LLO tail. This behavior demonstrates the influence of the sugar portion on the dynamics of these glycoconjugates. The presence of the glycan chains and their interactions with membranes could facilitate the binding to OSTs, other glycosyltransferases, (12) or other proteins, such as the ABC transporter Protein Glycosylation Locus K (PglK), responsible for translocation of LLOs that serve as donors in N-linked protein glycosylation in Bacterial (40). Figure 5: Distributions of the distances between different carbons of the LLOs hydrophobic tail. A) Histograms of the distances between carbons C1A and C9E (red line) and carbons C1A and C19E (blue line) for every LLO. B) Distribution of distance values between carbons C1A and C9E (red dots) and carbons C1A and C19E (blue dots) among axis X and Y. Orientation, structure, and dynamics of the oligosaccharide component in membrane bilayers Aiming to properly describe the motion of the oligosaccharidic portion of the LLO, we analyzed different aspects of its structure. The tilt angle of this region, which describes the movement of the oligosaccharide relative to the membrane bilayer, Manuscript submitted to Biophysical Journal 7 Author1 and Author2 is described by a vector connecting distant monosaccharides, as follows: for the bacterial LLO, the monosaccharide (Bac) nearest to the reducing end and the one in the oxidizing end (NAc-Galactosamine) were selected; for the archaeal LLO, two measurements were performed (due to the existing branch ramifications in the glycan), one using 1 and 4’ (red line), and the other one using 1 and 4 (blue line); for the eukaryotic LLO, also two measurements were taken, using 1 and 3’ (red line), and 1 and 3 (blue line). As shown in Figure 6A, the main vector indicates that the preferred position for all glycan chains is in parallel with the bilayer (˜90 degrees), as also previously observed (12). This observations are further confirmed by the average positions of the saccharadic residues relative to the Z axis (Figure 6B), showing that the majority of the monosaccharide residues are close to the membrane. It is noticeable that the archaeal glycan chain appears to be more flexible in terms of this interaction, demonstrating higher fluctuations on the positions of its residues. Again, this might be a result of the absence of one phosphate group, decreasing the strength of interactions with the lipid head groups, allowing an extended motion for this moiety. Correspondingly, the same behavior is depicted by the residence time of all saccharidic residues at the membrane (Figure 6C). Most residues interact with the membrane during the whole simulations time, with few minor exceptions identified in both Archaea (residues 4, 5, and 3’) and Eukarya (residues 3, 4, and D3). Possibly, the number of ramifications influenced the residence time, as an increased number of ramifications caused a higher decrease on this property for some monosaccharide residues. The final positions, after 1 µs simulation time, of the LLO structures (Figure 8) provide a general model for how these molecules are embedded in the membrane bilayers and how their conformations could be depicted. Figure 6: Tilt angles calculated for the vectors describing the motion of the glycan chains of the LLOs. A) Free-energy surfaces calculated through probability density histograms converted using ∆G = -kb * T * ln(H(x)). The blue line describes the vector connecting 1 and 4 in Eukarya, 1 and 6 in Bacteria, and 1 and 3 in Archaea. The red line describes the vector connecting residues 1 and 4’ in Eukarya, and 1 and 3’ in Archaea. B) Positions of each sugar residue relative to the Z-axis are represented by blue dots (average position in blue, error bars in red); C) Residence time of each sugar residue in the membrane interface, calculated using a cutoff of 4.5 Å. 8 Manuscript submitted to Biophysical Journal Biophysical Journal Template Figure 7: Final conformations adopted by the eukaryotic, the bacterial and the archaeal LLOs, after 1 µs simulations. The predominant orientation of the glycan chain relative to the membrane is observed, as well as a nonspecific behavior for the different hydrophobic chains. Molecular Docking of LLOs in each OST model By exploring all the generated snapshots of each simulation, we were able to identify cluster of structures that represents the most abundant models for all LLOs molecules. After identifying these clusters, we submitted the most prevalent structures to the PatchDock molecular docking server against the corresponding OST counterpart, employing distance restraints to increase the accuracy of our results. From this methodology, we obtained at least one complex that respected the expected distances (Figure 9) and was located around the expected donor cavity. For the bacterial LLO, the molecule mostly displayed a similar orientation as seen in the recent crystal structure (PDB ID 5OGL) (41) of the full complex, but with closer distances between the peptide acceptor and the LLO donor (3.5 Åvs 4.5 Åin the crystallographic structure). The undecaprenyl portion fitted in the previously identified hydrophobic groove, while the saccharidic moiety interacted along the cavity formed in the interface between TM domain and PP domain. The archaeal LLO docking revealed an interesting feature for the WWDXGX motif, which in A. fulgidus AglB (PDB ID 5GMY) (42) is composed of WWDYGH. His555 plays the role of Tyr468 in the WWDYGY motif of C. jejuni PglB, performing electrostatic interactions between its imidazole side-chain ring and the sulfate connected to the C2 of the reducing end monosaccharide of the archaeal LLO. This observations indicate that the WWDXGX motif is directly involved in the selectivity of the donor substrate of OSTs and the enzyme possibly evolved to better accommodate specific glycan ligands. A similar residue substitution occurs in the archaeal enzyme, where residue Tyr79 in C. lari becomes His81 in A. fulgidus. The position occupied by this amino acid, in close proximity to the sulfate group connected to C2 of the first monosaccharide, suggests that His81 (and Tyr79 in C. lari) acts as an additional interaction that assures correct positioning of the LLO inside the enzyme cavity. Besides that, another residue was observed to interact with the LLO phosphate groups, namely Trp215, performing a similar role seen in the enzymes from the other domains of life (Trp208 in eukarya and Tyr196 in Bacteria) (41, 43), explaining the decrease in the enzyme activity when it was mutated to alanine (11). The remaining carbohydrate residues of the molecule are in contact with AglB domains interface, where the last two monosaccharide are located near the N-terminal region of EL5. The eukaryotic LLO docking was performed in the single catalytic subunit (PDB ID 6FTI) (9), achieving a pose that fitted adequately into the donor cavity. Due to the larger number of monosaccharide units and the distinct conformations sampled in our simulations, it was not possible to reproduce the exact same conformation of the small LLO found in the cryo-EM structure of the mammalian OST complex (9). In spite of that, alignment of the newly formed complex to the full OST complex did not generate any clashes with the other subunits, further validating the obtained solution in the molecular docking calculation. Based on our obtained model and on the three reports of the recently released structures of the full OST complex, we verified that the glycan portion of the LLO molecule must interact with EL5, while inducing the ordering of TMH9, and that EL5 must be in a flexible state, not defined by a specific secondary structure. We could not observe the predicted entry route, since the high fluctuations of the hydrophobic tail generated conformations that were not compatible for fitting of its whole structure within the OST transmembrane domain. Manuscript submitted to Biophysical Journal 9 Author1 and Author2 Figure 8: Docking poses obtained for each studied LLO inside the corresponding OSTs. A) Bacterial LLO complexed with PglB depicting the following interactions: Bac-O7 and Tyr468-OH, Asn-ND2 and Bac-C1; B) Archaeal LLO complexed with AglB depicting the following interactions: Glc1-S2 and His555-ND1, Glc1-S3 and Trp215-NE1, Asn-ND2 and Glc1-C1; C) Eukaryotic LLO complexed with Stt3 subunit depicting the following interactions: NAcGlc1-O7 and Tyr530-OH, Asn-ND2 and NAcGlc-C1. All phosphates interacted with the respective ion, when present in the crystallographic structure. 10 Manuscript submitted to Biophysical Journal Biophysical Journal Template CONCLUSION Understanding the conformations and dynamics from distinct LLOs could support a better comprehension of the complex formation with OSTs, and, consequently, provide insights to enhance the efficiency of OSTs glycosylation turnover, as well as allow the engineering of these enzymes. In this report, we characterized three LLOs, one from each domain of life, employing molecular modelling techniques, such as QM calculations, MD simulations. By properly parameterizing the isoprenoid unit (one of the building blocks of the LLOs hydrophobic tail), achieving reasonable agreement with experimental data (less than 10%), we were able to describe the highly flexible behavior of this portion of the molecule, demonstrating its distribution inside the membranes. Additionally, the glycan chains were also evaluated in respect of their motions and interactions, demonstrating the clear preference for a parallel position relative to the membrane bilayer, mostly in close contact the phospholipids head groups, similarly as previous reports (12). Molecular docking trials have provided us with a complex for each LLO inside the respective OSTs cavities, which respected cutoff distances expected from previously determined OSTs structures. By correlating recent crystallographic data from the bacterial OST (PglB) (41) with our results from the archaeal LLO docking model, we were able to elaborate an explanation for the change in amino acids in the WWDXGX motif. This observations highlights the substrate selectivity role of this motif, allowing the prediction of interactions between glycans and this region. Furthermore, we described a rationale for the amino acid substitutions seen in Archaea, and the sulfate group in C2 observed on this LLO. Besides that, all of our models sampled conformations that could properly fit inside the donor substrate binding site, further validating our findings. We hope this work contributes for future investigations regarding the glycosylation activity and the distinct carbohydrate moieties distributed among other species. AUTHOR CONTRIBUTIONS P.R.A. and C.P. designed the research, carried out all simulations, analyzed the data, and wrote the article. L.P. parameterized the molecules and wrote the article. M.D.P. analyzed the data. H.V. conceived and supervised the project, designed the research, and wrote the article. ACKNOWLEDGMENTS This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnologico (CNPq), MCT, Brasilia, DF, Brazil; the Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES), MEC, Brasilia, DF, Brazil. REFERENCES 1. Varki, A., 1993. Biological roles of oligosaccharides: all of the theories are correct. Glycobiology 3:97–130. 2. Larkin, A., and B. Imperiali, 2011. The expanding horizons of asparagine-linked glycosylation. 3. Kowarik, M., S. Numao, M. F. Feldman, B. L. Schulz, N. Callewaert, E. Kiermaier, I. Catrein, and M. Aebi. N-linked glycosylation of folded proteins by the bacterial oligosaccharyltransferase. Science (New York, N.Y.) 1148–50. 4. Kowarik, M., N. M. Young, S. Numao, B. L. Schulz, I. Hug, N. Callewaert, D. C. Mills, D. C. Watson, M. Hernandez, J. F. Kelly, M. Wacker, and M. Aebi, 2006. Definition of the bacterial N-glycosylation site consensus sequence. The EMBO journal 25:1957–1966. 5. Weerapana, E., and B. Imperiali, 2006. Asparagine-linked protein glycosylation: From eukaryotic to prokaryotic systems. Glycobiology 16:91–101. 6. Schwarz, F., C. Lizak, Y. Y. Fan, S. Fleurkens, M. Kowarik, and M. Aebi, 2011. Relaxed acceptor site specificity of bacterial oligosaccharyltransferase in vivo. Glycobiology 21:45–54. 7. Wacker, M., M. F. Feldman, N. Callewaert, M. Kowarik, B. R. Clarke, N. L. Pohl, M. Hernandez, E. D. Vines, M. A. Valvano, C. Whitfield, and M. Aebi, 2006. Substrate specificity of bacterial oligosaccharyltransferase suggests a common transfer mechanism for the bacterial and eukaryotic systems. Proceedings of the National Academy of Sciences . 8. Wacker, M., D. Linton, P. G. Hitchen, M. Nita-Lazar, S. M. Haslam, S. J. North, M. Panico, H. R. Morris, A. Dell, B. W. Wren, and M. Aebi, 2002. N-Linked Glycosylation in Campylobacter jejuni and Its Functional Transfer into E. coli. Science 298:1790–1793. Manuscript submitted to Biophysical Journal 11 Author1 and Author2 9. Braunger, K., S. Pfeffer, S. Shrimal, R. Gilmore, O. Berninghausen, E. C. Mandon, T. Becker, F. Förster, and R. Beckmann, 2018. Structural basis for coupling protein transport and N-glycosylation at the mammalian endoplasmic reticulum. Science 360:215–219. 10. Lizak, C., S. Gerber, S. Numao, M. Aebi, and K. P. Locher, 2011. X-ray structure of a bacterial oligosaccharyltransferase. Nature . 11. Matsumoto, S., A. Shimada, J. Nyirenda, M. Igura, Y. Kawano, and D. Kohda, 2013. Crystal structures of an archaeal oligosaccharyltransferase provide insights into the catalytic cycle of N-linked protein glycosylation. Proceedings of the National Academy of Sciences of the United States of America 110:17868–73. 12. Kern, N. R., H. S. Lee, E. L. Wu, S. Park, K. Vanommeslaeghe, A. D. Mackerell, J. B. Klauda, S. Jo, and W. Im, 2014. Lipid-linked oligosaccharides in membranes sample conformations that facilitate binding to oligosaccharyltransferase. Biophysical Journal 107:1885–1895. 13. Pronk, S., S. P??ll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. Van Der Spoel, B. Hess, and E. Lindahl, 2013. GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29:845–854. 14. Oostenbrink, C., A. Villa, A. E. Mark, and W. F. van Gunsteren, 2004. A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5 and 53A6. Journal of computational chemistry 25:1656–76. 15. Pol-Fachin, L., V. H. Rusu, H. Verli, and R. D. Lins, 2012. GROMOS 53A6 GLYC , an Improved GROMOS Force Field for Hexopyranose-Based Carbohydrates. Journal of Chemical Theory and Computation 8:4681–4690. 16. Pol-Fachin, L., H. Verli, and R. D. Lins, 2014. Extension and validation of the GROMOS 53A6(GLYC) parameter set for glycoproteins. Journal of computational chemistry 35:2087–95. 17. Huang, J., and A. D. Mackerell, 2013. CHARMM36 all-atom additive protein force field: Validation based on comparison to NMR data. Journal of Computational Chemistry 34:2135–2145. 18. Rusu, V. H., R. Baron, and R. D. Lins, 2014. PITOMBA: Parameter Interface for Oligosaccharide Molecules Based on Atoms. Journal of chemical theory and computation 10:5068–5080. 19. Pol-Fachin, L., R. V. Serrato, and H. Verli, 2010. Solution conformation and dynamics of exopolysaccharides from Burkholderia species. Carbohydrate Research 345:1922–1931. 20. Pol-Fachin, L., C. L. Fernandes, and H. Verli, 2009. GROMOS96 43a1 performance on the characterization of glycoprotein conformational ensembles through molecular dynamics simulations. Carbohydrate Research 344:491–500. 21. van der Spoel, D., and E. Lindahl. Brute-Force Molecular Dynamics Simulations of Villin Headpiece: Comparison with NMR Parameters. The Journal of Physical Chemistry B 11178–11187. 22. Abraham, M. J., T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, and E. Lindahl, 2015. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1:19–25. 23. Pedebos, C., L. Pol-Fachin, and H. Verli, 2012. Unrestrained conformational characterization of Stenocereus eruca saponins in aqueous and nonaqueous solvents. Journal of Natural Products 75:1196–1200. 24. Figueira, F., A. S. F. Farinha, P. V. Muteto, M. D. Polêto, H. Verli, M. T. S. R. Gomes, A. C. Tomé, J. A. S. Cavaleiro, and J. P. C. Tomé. [28]Hexaphyrin derivatives for anion recognition in organic and aqueous media. Chem. Commun. 2181–2184. 25. Arantes, P. R., L. G. Sachett, C. S. Graebin, and H. Verli, 2014. Conformational characterization of ipomotaosides and their recognition by COX-1 and 2. Molecules 19:5421–5433. 26. Caleman, C., P. J. van Maaren, M. Hong, J. S. Hub, L. T. Costa, and D. van der Spoel. Force Field Benchmark of Organic Liquids: Density, Enthalpy of Vaporization, Heat Capacities, Surface Tension, Isothermal Compressibility, Volumetric Expansion Coefficient, and Dielectric Constant. Journal of chemical theory and computation 61–74. 12 Manuscript submitted to Biophysical Journal Biophysical Journal Template 27. Berendsen, H. J. C., J. P. M. Postma, W. F. van Gunsteren, A. DiNola, and J. R. Haak, 1984. Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics 81:3684. 28. Horta, B. A. C., P. T. Merz, P. F. J. Fuchs, J. Dolenc, S. Riniker, and P. H. Hünenberger. A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. Journal of Chemical Theory and Computation 3825–3850. 29. Kandt, C., W. L. Ash, and D. Peter Tieleman, 2007. Setting up and running molecular dynamics simulations of membrane proteins. Methods 41:475–488. 30. Berendsen, H. J. C., J. R. Grigera, and T. P. Straatsma, 1987. The Missing Term in Effective Pair Potentials. Journal of Physical Chemistry 91:6269–6271. 31. Hess, B., H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije, 1997. LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry 18:1463–1472. 32. Darden, T., D. York, and L. Pedersen, 1993. Particle mesh Ewald: An N-log(N) method for Ewald sums in large systems. The Journal of Chemical Physics 98:10089. 33. Parrinello, M., 1981. Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied Physics 52:7182. 34. Nosé, S., and M. L. Klein, 1983. Constant pressure molecular dynamics for molecular systems. Molecular Physics 50:1055–1076. 35. Bussi, G., D. Donadio, and M. Parrinello, 2007. Canonical sampling through velocity rescaling. The Journal of chemical physics 126:014101. 36. Nosé, S., 1984. A molecular dynamics method for simulations in the canonical ensemble. Molecular Physics 52:255–268. 37. Hoover, W. G., 1985. Canonical dynamics: Equilibrium phase-space distributions. Physical Review A 31:1695–1697. 38. Chickos, J. S., and W. E. A. Jr. Enthalpies of Vaporization of Organic and Organometallic Compounds, 18802002. Journal of Physical and Chemical Reference Data 519. 39. Haynes, W. M., 2014. CRC handbook of chemistry and physics. CRC press. 40. Pérez, S., and D. De Sanctis, 2017. Glycoscience@Synchrotron: Synchrotron radiation applied to structural glycoscience. Beilstein Journal of Organic Chemistry 13:1145–1167. 41. Napiórkowska, M., J. Boilevin, T. Sovdat, T. Darbre, J. L. Reymond, M. Aebi, and K. P. Locher, 2017. Molecular basis of lipid-linked oligosaccharide recognition and processing by bacterial oligosaccharyltransferase. Nature Structural and Molecular Biology 24:1100–1106. 42. Matsumoto, S., Y. Taguchi, A. Shimada, M. Igura, and D. Kohda, 2017. Tethering an N-glycosylation sequon-containing peptide creates a catalytically competent oligosaccharyltransferase complex. Biochemistry . 43. Li, H., M. Chavan, H. Schindelin, W. J. Lennarz, and H. Li, 2008. Structure of the Oligosaccharyl Transferase Complex at 12 Å Resolution. Structure 16:432–440. Manuscript submitted to Biophysical Journal 13 Author1 and Author2 SUPPLEMENTARY MATERIAL Figure S1: Density profiles for the subdivided parts of the studied LLOs, without glycan chains, hydrophobic tail considering the membrane normal (Z axis) of the membrane bilayers: membrane head groups (yellow), lipid acyl chains (black), isoprenoid chain (blue) pyrophosphate linkage (red), and the last carbon atom of isoprenoid chain (green). We enhanced the pyrophosphate linkage, dolichol chain, and dolichols last carbon distributions scaling them by a factor of ten to facilitate the observation of data. Figure S2: Distributions of the distances between different carbons of the LLOs, without glycan chains, hydrophobic tail. A) Histograms of the distances between carbons C1A and C9E (red line) and carbons C1A and C19E (blue line) for every LLO. B) Distribution of distance values between carbons C1A and C9E (red dots) and carbons C1A and C19E (blue dots) among axis X and Y. An online supplement to this article can be found by visiting BJ Online at http://www.biophysj.org. 14 Manuscript submitted to Biophysical Journal ANEXO A. Trabalhos desenvolvidos ao longo do doutoramento 237 A.4 Role of structural ions on the dynamics of the Pseudomonas fluorescens 07a metalloprotease Bactérias psicrotróficas produzem proteases termorresistentes que podem hidrolisar proteínas do leite, levando à perda de qualidade e de produção láctea. Estudos de caracterização dessas enzimas proporcionam uma melhor compreensão de sua atividade e são fundamentais para a elaboração de tecnologias que contornam problemas associados, como perda da estabilidade térmica do leite, gelificação durante o processo de tratamento UHT ou modificação do sabor padrão. A caracterização de uma metaloprotease extracelular produzida pela cepa 07A de Pseudomonas fluorescens permitiu determinar as condições que favorecem de sua produção e atividade no leite e avaliar o papel de íons divalentes (em especial, Ca2+ e Mn2+) em sua atividade e termorresistência. No trabalho à seguir, a estrutura tridimensional da metaloprotease foi construída utilizando técnicas de modelagem molecular e análises de conservação de resíduos de aminoácidos permitiram derivar a localização dos íons estruturais Ca2+ e do cofator catalítico Mn2+ à partir dos moldes utilizados. A enzima foi submetida à simulações de dinâmica molecular sob diferentes condições: I - sem a presença de íons estruturais ligados à estrutura; II - na presença somente de Mn2+; e III - na presença de Mn2+ e Ca2+. Cada um dos sistemas foi simulado à 310K e 353K para análises do efeito da temperatura na estabilidade e dinâmica molecular. Cada um dos sistemas foi simulado em triplicatas de 500 ns, totalizando 18 simulações (9 microssegundos). Os resultados experimentais obtidos nesse trabalho demonstram que o íon Mn2+ pode competir pelo sítio catalítico da enzima juntamente com outros íons divalentes, como Ca2+ ou Zn2+. Os resultados computacionais sugerem que a presença de íons estruturais Ca2+ protegem parcialmente a enzima de desnaturações promovidas pelo aumento de temperatura, uma desestabilização conformacional na porção C-terminal da enzima. Ainda, nossas simulações descreveram um movimento coletivo de abertura e fechamento do sítio catalítico associado à presença de Ca2+ e Mn2+ ligados à seus respectivos domínios. Além de proporcionar uma melhor compreensão dos mecanismos moleculares associados à atividade e termorresistência dessa metaloprotease, esses dados podem também proporcionar o desenvolvimento de novas medidas de controle da atividade de proteases de P. fluorescens e outros microrganismos. Food Chemistry 286 (2019) 309–315 Contents lists available at ScienceDirect Food Chemistry journal homepage: www.elsevier.com/locate/foodchem Research Article Role of structural ions on the dynamics of the Pseudomonas fluorescens 07A metalloprotease Marcelo D. Polêtoa,1, Maura P. Alvesb,1, Rodrigo Ligabue-Braunc, Monique R. Ellerb, Antonio Fernandes De carvalhob,⁎ a Structural Bioinformatics Group, Biotechnology Center, Federal University of Rio Grande do Sul, Av. Bento Gonçalves, 9500 Porto Alegre, RS, Brazil b Laboratório Inovaleite, Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa, Av. Peter Henry Rolfs, s/n – Viçosa, Brazil c Department of Pharmaceutical Sciences, Federal University of Health Sciences of Porto Alegre (UFCSPA), Porto Alegre, RS, Brazil T ARTICLE INFO Keywords: Molecular motion Enzymatic activity Molecular dynamics ABSTRACT The molecular dynamics of the Pseudomonas fluorescens 07A metalloprotease in the presence of structural Ca2+ and Mn2+ ions was evaluated. Seven Ca2+ ions are primarily bound to the C-terminus, while a divalent cation is located at the catalytic site, acting as a cofactor. The observed enzyme’s experimental activity suggests that Mn2+ could compete for the active site of the enzyme with Ca2+, Zn2+ or other divalent cations, thus providing greater catalytic power to the enzyme. Our molecular dynamics simulations suggest that these ions partially protect the enzyme’s structure from thermal denaturation. Moreover, our simulations have shown a collective movement of opening-closing of the active-site in simulations with structural Ca2+ and Mn2+ ions bound, leading to a proposal of a dynamical model of P. fluorescens 07A metalloprotease active and inactive conformations. These findings can support the development of measures to control the activity of P. fluorescens and other spoilage microorganism proteases. 1. Introduction Psychrotrophic bacteria produce thermoresistant proteases that hydrolyze milk proteins, which leads to loss of quality and yield in dairy production. Studies on the characterization of these enzymes and an understanding of the conditions that influence their production and activity are essential to surmounting technological issues associated with their activity, such as UHT milk gelling, loss of milk thermal stability and off-flavor formation, as well as losses that can occur during cheese production (Andreani et al., 2016; Machado, Baglinière, Marchand, Van Coillie, Vanetti, De Block, & Heyndrickx, 2017; Marchand, Duquenne, Heyndrickx, Coudijzer, & De Block, 2017). An extracellular metalloprotease produced by the Pseudomonas fluorescens 07A strain has been characterized (Alves, Salgado, Eller, Vidigal, & Carvalho, 2016) and the conditions favoring its production and activity in milk were determined (Alves et al., 2018). It was shown that the presence of calcium ions did not increase the enzyme’s activity or thermostability, even though the enzyme had a binding domain for this ion and showed significantly decreased activity in the presence of EDTA, suggesting that Ca2+ íons are already bound to the structure in natural conditions. On the other hand, increased activity and thermostability were observed when Mn2+ ions were present. Given the presence of both manganese and calcium ions in milk (1200 mg/L calcium and 30 µg/L manganese) (Fox & McSweeney, 1998), the interaction dynamics between the enzyme and these ions can help the determination of the enzyme’s activity and stability at a molecular level. Thus, this knowledge could aid in the development of new processes to control the enzyme activity in milk. Moreover, determining the metalloprotease structural dynamics in solutions where ions are absent or present could provide valuable information on the metalloprotease’s stability in different environmental conditions, as well as its activity levels when subjected to high temperatures. Thus, molecular modelling calculations were carried out accordingly, followed by molecular dynamics simulations of the previously characterized P. fluorescens 07A metalloprotease in different temperatures and in presence or absence of its structural ions. The results presented here could offer subsidies to the dairy industry on the quest for alternatives during the dairy processing or adapting technologies geared towards higher control of protease activity. ⁎ Corresponding author. E-mail address: antoniofernandes@ufv.br (A.F. De carvalho). 1 Both are first authors. https://doi.org/10.1016/j.foodchem.2019.01.204 Received 1 October 2018; Received in revised form 24 January 2019; Accepted 31 January 2019 Available online 07 February 2019 0308-8146/ © 2019 Elsevier Ltd. All rights reserved. M.D. Polêto, et al. Food Chemistry 286 (2019) 309–315 2. Material and methods 2.1. Biological assays The enzyme used in the experiments was produced from the bacterium P. fluorescens 07A and purified as described by Alves et al. (2016). The activity and thermostability (353 K) of the enzyme were evaluated with calcium and manganese divalent cations present, both individually and together, at 5 mM concentration. Activity and thermostability were also evaluated in the presence of the EDTA chelator. The proteolytic activity was determined using azocasein (SigmaAldrich) as a substrate, as described by Ayora and Gotz (1994), with modifications. An aliquot of 250 µL of azocasein solution at 2% (w/v) in 40 mM Tris-HCl, pH 7.5 was added to 150 µL of the test solution. The mixture was agitated and incubated at 37 °C for 8 h, and the reaction was stopped with the addition of 1.2 mL of 10% (w/v) trichloroacetic acid. After 15 min of incubation at room temperature, the mixture was centrifuged at 12,000 × g for 15 min at 4 °C. The supernatant was neutralized with the addition of 1.0 mL of 1 M NaOH. The absorbance of the resulting solution was measured at 440 nm using a SpectraMax M2 spectrometer (Molecular Devices, Sunnyvale, California, USA). The results were expressed as relative activity, when control treatments were considered as 100%. 2.2. Molecular modelling As the extracellular metalloprotease of the P. fluorescens 07A characterized by Alves et al. (2016) had no determined tridimensional structure, the structure was obtained by comparative molecular modelling. Briefly, a BLAST search against RCSB PDB (Altschul, Gish, Miller, Myers, & Lipman, 1990; Berman et al., 2000) was performed to identify suitable structural templates with the highest sequence identity and coverage. The template identified (PDB ID 1G9K, 67% identity, 99% coverage) (Aghajari et al., 2003) was aligned to the target sequence and modelled using Modeller 9.16 (Webb and Sali, 2016). The template was also used to determine the ions’ spatial positioning. Active site conservation analyses were performed by structural comparison and superposition with PyMol (Schrödinger LLC), using structures identified by BLAST search with identities to the inspected protein above 51% and coverage above 95%. These structures correspond to PDB IDs 1AF0, 1GO7, 1GO8, 1H71, 1JIW, 1K7G, 1K7I, 1K7Q, 1KAP, 1SAT, 1SMP, 1SRP, 3HB2, 3HBU, 3HBV, 3U1R, 4I35, 5D7W) (Hege, Feltzer, Gray, & Baumann, 2001; Aghajari et al., 2003; Hege & Baumann, 2001; Baumann, Wu, Flaherty, & McKay, 1993; Baumann, 1994; Baumann, Bauer, Letoffe, Delepelaire, & Wandersman, 1995; Hamada et al., 1996; Ogle et al., 2001; Grabarse et al., 2001; Hoog et al., 1995; Zhang et al., 2011; Wu, Ran, Wang, & Xu, 2016). Protein domains and signatures were identified by a comparative search in the Conserved Domain Database (Marchler-Bauer et al., 2017). Additional sequence conservation analyses were performed with AliView (Larsson, 2014). Phylogenetic analyses were performed with MEGA7, under Maximum-Likelihood method, using WAG substitution model with gamma distributions (Whelan & Goldman, 2001; Kumar, Stecher, & Tamura, 2016). Sequences for the latter two analyses were selected using BLAST search for highly similar sequences (similarity above 95%, identity above 85%, total coverage). 2.3. Molecular dynamics simulation setups In order to evaluate the structural dynamics of the P. fluorescens 07A metalloproatese, molecular dynamics simulations were carried out using GROMACS 5.1.4 package (Abraham et al., 2015) and AMBER99SB-ildn (Lindorff-Larsen et al., 2010) force field. In addition, Lennard-Jones parameters for Mn2+ ion were obtained from Babu and Lim (2005). These were calibrated using hydration free-energies as targets and resulted in a strong agreement between experimental and simulated values. The effect of different temperatures on protease dynamics was evaluated using two simulation temperatures of 310 K (optimal activity temperature) and 353 K (industrial treatment temperature). In addition, the roles of structural Ca2+ and Mn2+ ions at the active site were evaluated by simulating 3 protein conditions: I – holo enzyme (protease + Ca2+ + Mn2+), II – apo enzyme (protease without metal ions) and III – enzyme with only Mn2+ at the active site. All simulations were carried out in triplicate, for a total of 18 simulations. The simulation systems were constructed by building a dodecahedral simulation box around the protein, with a distance of 10 Å around the box edges. The systems were then minimized in vacuo. The boxes were filled with a TIP3P water model (Jorgensen, Chandrasekhar, Madura, Impey, & Klein, 1983) and bulk Na+ e Cl− ions were added to neutralize the total net charges produced by amino acid residues at pH 7.0. The systems were minimized once again and equilibrated at the selected temperatures for 2 ns in a NVT ensemble, using position restraint forces of 1000 kJ/mol for all protein atoms, a V-rescale thermostat (Bussi, Donadio, & Parrinello, 2007) and tau_t = 0.1 ps. Subsequently, a series of equilibrations were carried out at NPT along with a Parrinelo-Rahman barostat (Parrinello & Rahman, 1981) at 1 bar, tau_p = 2.0 ps using the following position restraint scheme: 2 ns – Heavy atoms (800 kJ/mol) 2 ns – Mainchain + β-carbon (600 kJ/mol) 2 ns – Mainchain (400 kJ/mol) 2 ns – Protein backbone (200 kJ/mol) 3 ns – α-carbons (100 kJ/mol) 3 ns – α-carbon (50 kJ/mol) This protocol was selected to provide a slow, gentle metalloprotease equilibration while accommodating liquid phase. Next, all 18 systems were simulated for 500 ns (totalizing 9 µs of simulation time) using the LINCS algorithm to maintain covalent bonds (Hess, 2008; Hess, Bekker, Berendsen, & Fraaije, 1997). A long-range interaction cutoff of 10 Å and the PME algorithm were used to treat both Lennard-Jones and Coulomb interactions. Cavities were mapped using the trj_cavity algorithm (Paramo et al., 2004). All simulations were carried out using the HPC resources of the SDumont supercomputer at the National Laboratory for Scientific Computing (LNCC/ MCTI, Brazil). 3. Results and discussion 3.1. Biological assays The presence of calcium did not affect the enzymatic activity in our test when EDTA was absent, which may indicate that the enzyme already had Ca2+ ions from the extract bound to its structure. However, when Mn2+ was present, the activity increased up to 9× (Supplementary Table 1), a result previously observed by Alves et al. (2016). In the original buffer with no ion addition, the heat treatment reduced the enzyme’s activity to 19% in 5 min. This indicates that Ca2+ partially protects the enzyme from denaturation, since the enzyme maintained 41% of its residual activity when the ion was present. Likewise, in the presence of Mn2+, the enzyme activity not only increased, but the thermal denaturation practically ceased (Supplementary Table 2). The combined action of both ions protected the enzymatic activity even more than when only Mn2+ was present, in a synergistic effect. The presence of EDTA reduced the enzyme’s activity to 30% of its original levels, which shows the importance of calcium for enzymatic catalysis. When this chelator was present, the heat treatment at 353 K reduced activity from 28.8 (relative to 100% in the absence of this chelator) to only 21.6% in 10 min, what corresponds to a residual 310 M.D. Polêto, et al. Food Chemistry 286 (2019) 309–315 Fig. 1. Modeled Pseudomonas fluorescens 07A extracellular metalloprotease, colored by domains. Yellow spheres denote Ca2+ ions, magenta sphere denotes Mn2+ ion. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) activity of 75% (Supplementary Table 3). This value exceeds the residual activity when this chelator is absent (18.28% in the absence of added ion – Supplementary Table 2). However, these results may be due to the low original activity the enzyme has previously shown in the presence of EDTA. 3.2. Structural characterization of extracellular metalloprotease of Pseudomonas fluorescens 07A The metalloprotease studied in this work has a typical bacterial metalloprotease fold (Fig. 1), composed of three domains: an N-terminus region (residues 1–70), a zinc-dependent serralysin-like domain (residues 71–258) and a characteristic serralysin-like C-terminus (residues 259–476). Ca2+ ions are primarily bound to the serralysin-like Cterminus between the β-strands, forming a complex interaction network (Supplementary Fig. 3), and can also be found in the loop region (residues 58–65). A divalent cation is located at the catalytic site, complexed with residues H183, H187 and H193. It was not determined, however, if the Ca2+ ions’ role would be exclusively structural or whether calcium could compete with other divalent cations for the active site. In addition to EDTA, phenanthroline also inhibits the activity of this enzyme (Alves et al., 2016), which would indicate that Zn2+ could be a cofactor in the active site. Tridimensional positioning of ions was based on sequence and structure conservation analyses. As the multiple structure and sequence comparisons have revealed, both the general enzyme fold and active site residues were conserved among the inspected proteins (Fig. 2 and Supplementary Fig. 1). The alignment of multiple Pseudomonas metalloprotease sequences (Supplementary Fig. 1) highlights the high level of conservation among these proteins, including the Ca2+ binding motifs and active site residues. This has been confirmed by the phylogenetic reconstruction (Supplementary Fig. 3), which defines the Pseudomonas metalloprotease studied here as very similar to Pseudomonas bacterial metalloproteases. Such similarities underscore the relevance of understanding the finer details of enzymatic tuning, especially in cases like this one, where a thermoresistant protease is able to spoil pasteurized milk. Thus, our findings on the importance of these ions to the enzyme’s structure and activity may further assist in the development of strategies to control 311 M.D. Polêto, et al. Food Chemistry 286 (2019) 309–315 Fig. 2. Superposition of 18 metalloprotease structures from the RCSB PDB (individually colored), highlighting the active site conservation. the enzyme’s activity in milk. The fact that general enzyme fold and active site residues are conserved indicates that the control applied to this protease could present an efficient means for controlling several other proteases from spoilage microorganisms in foods. 3.3. Molecular dynamics To further investigate the enzyme dynamics and its relationship regarding structural Ca2+ and Mn2+ ions in an aqueous solvent, molecular dynamics simulations were carried out in triplicate, for systems I – where the metal ions were absent, II – with only Mn2+ bound to the catalytic site, and III – with Mn2+ and Ca2+ both bound to the enzyme. The minimum distance between each divalent cation and the protein was calculated throughout the trajectories in order to confirm that these ions remained in their binding site and their interactions network was maintained during the simulation (Supplementary Table 4). Root mean square deviations (RMSD) were calculated for each system and for simulation temperatures throughout the simulation using the modeled structure as reference. These were further averaged within the replicates (Fig. 3). These calculations were carried out for residues 71–476, and thus did not take into account the N-terminus region, due to its high flexibility (Kufareva & Abagyan, 2012). The tendency of lower RMSD values obtained for system III at 353 K indicate that the presence of Ca2+ ions bound the serralysin-like domain and the Mn2+ bound to the active site might play a role in stabilizing the enzyme’s structure. These results may be related to the experimentally observed differential enzyme activity in presence of such divalent cations, which might explains the 9x increase observed in the enzymatic activity in the presence of Mn2+ alone and 7× in the presence of both ions (Supplementary Table 1). The simultaneous addition of both ions at the same concentration levels suggests a slight thermodynamic preference of the active site for binding to Mn2+ or that the enzyme’s activation energy decreases in the presence of this ion in the active site. The presence of the Mn2+ ion would compensate for the low enzyme activity demonstrated when Ca2+ alone was present, even if the latter is present in greater/equal concentration. A temperature increase (Fig. 3 – red) yielded higher protein mobility due to the increment of kinetic energy within the system. Still, when the simulations were evaluated at 353 K, the presence of both Ca2+ and Mn2+ bound to their respective domains (Fig. 3 – bottom panel) yielded lower RMSD simulation values, in contrast to simulations carried out when Ca2+ ions were absent (Fig. 3 – upper and middle panels). The presence of Mn2+ alone at the active site (Fig. 3 – middle panel) produces a distinct behavior than that observed when it is absent (Fig. 3 – upper panel). Together, these results indicate that Ca2+ ions play an important structural role in metalloprotease ther- moresistance. In addition, time-averaged root mean square fluctuations (RMSF) of each residue were evaluated for both temperature conditions and values were averaged between triplicates. The differences were calculated by subtracting the obtained RMSF profile in absence of Ca2+ and Mn2+ from the RMSF profile calculated for the system with the structural ions present (Fig. 4). Our data show a higher residue fluctuation in domain 259–476 when Ca2+ ions were absent at both simulation temperatures, thus reinforcing the hypothesis that such ions are important for this domain’s structural stability. In fact, the differential RMSF calculation showed in Fig. 3 revealed a higher overall fluctuation for the protein simulated in absence of structural ions at 353 K (more negative values) than the system simulated at 310 K (data not shown). Along with the RMSD data, these results confirm that the structural Ca2+ ions partially protect the enzyme’s structure from thermal denaturation due to the strong interaction network formed between the ions and serralysin-like domain (Supplementary Fig. 3), since the absence of these structural ions increases the fluctuation of the residues in the binding site and the deviation of the overall structure from the initial conformation. Moreover, the flexibility of the loop at residues 58–65 and 122–127 increased when simulated at 353 K in absence of Ca2+, but maintained an equal flexibility at 310 K, suggesting that this specific region is more sensitive to higher temperatures. This is somewhat expected, since both loops 58–65 and 122–127 directly interact with a structural Ca2+ ion in 312 M.D. Polêto, et al. Food Chemistry 286 (2019) 309–315 Fig. 4. Differential RMSF profiles for each metalloprotease residue obtained in simulations at 310 K (blue) and 353 K (red). Positive values indicate higher fluctuations in simulations in the presence of structural Ca2+ and Mn2+, while negative values indicate higher fluctuations in simulations without the ions. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 3. Average RMSD calculations of the metalloprotease during triplicate simulations at 310 K (blue) and 353 K (red). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) natural conditions. Interestingly, a higher fluctuation pattern was found for residues 1–10 in our simulations with Ca2+ and Mn2+ bound to the protein when comparing with the simulations in absence of such ions. However, the opposite was found for residues 10–15, but no clear evidence at a molecular level was found that could explain this region behavior both in presence or absence of Ca2+ bound to the enzyme. Nevertheless, it is important to mention that the region 1–20 is poorly described in the 3D structures resolved by X-ray crystallography and used here for structural comparisons due to the high flexibility in this region, which is somewhat described in our simulations. The importance of calcium ions has also been demonstrated for the structure, thermal stability and substrate affinity of the serine protease proteinase K (Liu, Tao, Meng, Fu, & Zang, 2011). The authors found that removal of the Ca2+ ions from the system enhanced the general flexibility of the enzyme and decreased its thermal stability. However, in segments surrounding the substrate-binding pockets, the flexibility decreased when Mn2+ and Ca2+ ions were absent. In other words, the enzyme’s affinity for the substrate is greater when these ions are present. It was also found that a molecular opening/closing motion occurs at the active site for this enzyme (Liu, Meng, Fu, & Zang, 2010). However, the removal of calcium does not affect the catalytic triad residue fluctuations (Liu et al., 2011). This confirms the biological findings that show maintenance of the enzyme activity even when Ca2+ ions are removed from the system. Supporting this observation, the role of Ca2+ ions in structural stabilization has been shown for multiple proteins (McPhalen, Strynadka, & James, 1991; Strynadka & James, 1991), including other enzymes, such as bacterial nitrous oxide re- ductase (Schneider & Einsle, 2016), and non-enzymatic proteins, such as the major car allergen, Fel d 1 (Ligabue-Braun, Sachett, Pol-Fachin, & Verli, 2015), and the human α-1 acid glycoprotein (Fernandes, LigabueBraun, & Verli, 2015). The dynamics of the catalytic cavity were investigated in our si- mulations by mapping its volume (Fig. 5) by using trj cavity, the XYZ coordinates of the center of mass of HIS183 as seed for searching the cavity and 3 voxel dimensions. In the absence of Ca2+ and Mn2+, the catalytic site volume decreases, probably due to a structural destabili- zation at the divalent cation binding site, collapsing the binding pocket (catalytic site volume below 2500 Å3). By contrast, the presence of only Mn2+ preserves the cavity’s structural features, such as the loop regions 136–141 and 196–204, which were not buried in the catalytic site. Moreover, repetitive behavior of increasing and decreasing volumes of the catalytic cavity was observed in enzyme when bound to Ca2+ and Mn2+. This was visually confirmed to be a transition between open and close conformation states of the metalloprotease (catalytic site volume above 3200 and between 3000 and 2500 Å3, respectively). This conformational transition was only observed in our simulations with both Ca2+ and Mn2+ bound to the enzyme, suggesting that, in presence of Ca2+ ions, the dynamics of serralysin-like C-terminus can influence the 313 M.D. Polêto, et al. Food Chemistry 286 (2019) 309–315 Fig. 5. Catalytic cavity volume throughout simulations. Although the absence of Mn2+ buries the catalytic cavity (green), the ion’s presence maintains the cavity structure (purple). The presence of Ca2+ and Mn2+ induces multiple open-close transitions. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Fig. 6. Molecular mechanism proposed in this work for opening and closing the active site of extracellular metalloprotease from Pseudomonas fluorescens 07A. On the left, the closed structure of the enzyme. A twist in the serralysin-like Cterminus domain (1) induces the opening of two loop regions 136–141 and 196–204 (2), leading to the open structure of the enzyme’s active site observed on the right. dynamics of a zinc-dependent serralysin-like domain, which in turn can affect molecular recognition processes and the formation of enzymeligand complexes. The role of both structural ions on the catalytic domain structure throughout the simulations was further investigated using principal component analysis (PCA) to evaluate the most common collective metalloprotease movements sampled by our simulations at 310 K. In all three systems, the most abundant eigenvector represented 25–30% of all sampled movements and was shown to be related to the closing of the enzyme catalytic cavity. However, the second most abundant eigenvector in simulations with both Ca2+ and Mn2+ bound to the enzyme was found to describe a relationship between a twist on domain 259–476 and the opening of the same catalytic cavity, representing another 10% of all sampled movements. This collective twist-opening movement could not be detected when these structural ions were absent or in the presence of only Mn2+ at the catalytic site, in accordance with the findings regarding the cavity opening-and-closing cycles reported above. Plotting the top 2 modes of motion of the trajectories containing structural Ca2+ and Mn2+ ions allowed us to identify the open and closed conformations of the metalloprotease along the eigenvectors axis (Supplementary Fig. 4). When combined, these results suggest that Ca2+ plays a key role in the dynamics of domain 259–476 and may lead to a major protein movement that could be crucial for molecular recognition of substrates on the enzyme catalysis. Therefore, we propose the following dynamical model of P. fluorescens 07A metalloprotease for active and inactive conformations adopted as result of the enzyme's interactions with structural Ca2+ and Mn2+ ions. A twist in the serralysin-like C-terminus domain occurs only in the presence of Ca2+ ions and induces the opening of two loop regions (136–141 and 196–204), leading to the open structure of the enzyme in the active site, at the zinc-dependent serralysin-like domain (Fig. 6 and Video 1). Video 1. 4. Conclusions The structural and dynamical features of P. fluorescens 07A metalloprotease and its interaction with structural Ca2+ and Mn2+ ions were studied here through molecular modelling and MD simulations. The sequence and structure of the P. fluorescens 07A metalloprotease were confirmed to be highly conserved among sequences of other Pseudomonas metalloproteases, allowing a broad comprehension of the features elucidated here. Moreover, our simulations revealed collective movements related to dynamical transitions between active and inactive conformations. In the presence of Ca2+ ions, a twist in the Cterminus domain of this enzyme induces the opening of two loop regions, leading to an opening on the active site. The data presented in this work provide subsidies to development of control measures against the activity of this and other enzymes from spoilage microorganism found in the food industry. Declaration of interest None. Acknowledgements The authors gratefully acknowledge financial support from the following Brazilian agencies: Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG), Fundação de Amparo à Pesquisa do Estado 314 M.D. Polêto, et al. Food Chemistry 286 (2019) 309–315 do Rio Grande do Sul (FAPERGS), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). The authors also acknowledge the National Laboratory for Scientific Computing (LNCC/ MCTI, Brazil) for providing HPC resources of the SDumont supercomputer, which have contributed to the research results reported within this paper. URL: http://sdumont.lncc.br. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.foodchem.2019.01.204. References Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindah, E. (2015). Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1–2, 19–25. Aghajari, N., Van Petegem, F., Villeret, V., Chessa, J. P., Gerday, C., Haser, R., & Van Beeumen, J. (2003). Crystal structures of a psychrophilic metalloprotease reveal new insights into catalysis by cold-adapted proteases. Proteins, 50, 636–647. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410. Alves, M. P., Salgado, R. L., Eller, M. R., Dias, R. S., De Paula, S. O., & Carvalho, A. F. (2018). Temperature modulates the production and activity of a metalloprotease from Pseudomonas fluorescens 07A in milk. Journal of Dairy Science, 101, 992–999. Alves, M. P., Salgado, R. L., Eller, M. R., Vidigal, P. M. P., & Carvalho, A. F. (2016). Characterization of a heat-resistant extracellular protease from Pseudomonas fluorescens 07A shows that low temperature treatments are more effective in deactivating its proteolytic activity. Journal of Dairy Science, 99, 7842–7851. Andreani, N. A., Carraro, L., Fasolato, L., Baizan, S., Lucchini, R., Novelli, E., & Cardazzo, B. (2016). Characterisation of the thermostable protease AprX in strains of Pseudomonas fluorescens and impact on the shelf-life of dairy products: Preliminary results. Italian Journal of Food Safety, 5, 239–244. Ayora, S., & Götz, F. (1994). Genetic and biochemical properties of an extracellular neutral metalloprotease from Staphylococcus hyicus subsp. hyicus. Molecular and General Genetics, 242, 421–430. Babu, C. S., & Lim, C. (2005). Empirical force fields for biologically active divalent metal cations in water. Journal of Physical Chemistry A, 110(2), 691–699. Baumann, U. (1994). Crystal structure of the 50 kDa metallo protease from Serratia marcescens. Journal of Molecular Biology, 242, 244–251. Baumann, U., Bauer, M., Letoffe, S., Delepelaire, P., & Wandersman, C. (1995). Crystal structure of a complex between Serratia marcescens metallo-protease and an inhibitor from Erwinia chrysanthemi. Journal of Molecular Biology, 248, 653–661. Baumann, U., Wu, S., Flaherty, K. M., & McKay, D. B. (1993). Three-dimensional structure of the alkaline protease of Pseudomonas aeruginosa: A two-domain protein with a calcium binding parallel beta roll motif. The EMBO Journal, 12, 3357–3364. Berman, H. M., Bhat, T. N., Bourne, P. E., Feng, Z., Gilliland, G., Weissig, H., & Westbrook, J. (2000). The Protein Data Bank and the challenge of structural genomics. Nature Structural & Molecular Biology, 7, 957–959. Bussi, G., Donadio, D., & Parrinello, M. (2007). Canonical sampling through velocity rescaling. The Journal of Chemical Physics, 126(1). Fernandes, C. L., Ligabue-Braun, R., & Verli, H. (2015). Structural glycobiology of human α1-acid glycoprotein and its implications for pharmacokinetics and inflammation. Glycobiology, 25, 1125–1133. Fox, P. F., & McSweeney, P. L. H. (1998). Dairy Chemistry and Biochemistry (1st ed.). London: Blackie Academic & Professional. Grabarse, W., Mahlert, F., Duin, E. C., Goubeaud, M., Shima, S., Thauer, R. K., ... Ermler, U. (2001). On the mechanism of biological methane formation: Structural evidence for conformational changes in methyl-coenzyme m reductase upon substrate binding. Journal of Molecular Biology, 309, 315. Hamada, K., Hata, Y., Katsuya, Y., Hiramatsu, H., Fujiwara, T., & Katsube, Y. (1996). Crystal structure of Serratia protease, a zinc-dependent proteinase from Serratia sp. E15, containing a beta-sheet coil motif at 2.0 A resolution. Journal of Biochemistry (Tokyo), 119, 844–851. Hege, T., & Baumann, U. (2001). Protease C of Erwinia chrysanthemi: The crystal structure and role of amino acids Y228 and E189. Journal of Molecular Biology, 314, 187–193. Hege, T., Feltzer, R. E., Gray, R. D., & Baumann, U. (2001). Crystal structure of a complex between Pseudomonas aeruginosa alkaline protease and its cognate inhibitor: Inhibition by a zinc-NH2 coordinative bond. Journal of Biological Chemistry, 276, 35087–35092. Hess, B. (2008). P-LINCS: A parallel linear constraint solver for molecular simulation. Journal of Chemical Theory and Computation, 4(1), 116–122. Hess, B., Bekker, H., Berendsen, H. J. C., & Fraaije, J. G. E. M. (1997). LINCS: A linear constraint solver for molecular simulations. Journal of Computational Chemistry, 18(12), 1463–1472. Hoog, S. S., Zhao, B., Winborne, E., Fisher, S., Green, D. W., DesJarlais, R. L., ... AbdelMeguid, S. S. (1995). A check on rational drug design: Crystal structure of a complex of human immunodeficiency virus type 1 protease with a novel gamma-turn mimetic inhibitor. Journal of Medicinal Chemistry, 38, 3246–3252. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., & Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics, 79(2), 926–935. Kufareva, I., & Abagyan, R. (2012). Methods of protein structure comparison. Methods in Molecular Biology, 857, 231–257. Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular evolutionary genetics analysis Version 7.0 for bigger datasets. Molecular Biology and Evolution, 33(7), 1870–1874. Larsson, A. (2014). AliView: A fast and lightweight alignment viewer and editor for large datasets. Bioinformatics, 30(22), 3276–3278. Ligabue-Braun, R., Sachett, L. G., Pol-Fachin, L., & Verli, H. (2015). The calcium goes meow: effects of ions and glycosylation on Fel d 1, the major cat allergen. PLoS One, 10, e0132311. Lindorff-Larsen, K., Piana, S., Palmo, K., Maragakis, P., Klepeis, J. L., Dror, R. O., & Shaw, D. E. (2010). Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins, 78(8), 1950–1958. Liu, S., Meng, Z., Fu, Y., & Zang, K. (2010). Insights derived from molecular dynamics simulation into th mmolecular motions of serine protease proteinase K. Journal of Molecular Modelling, 16, 17–28. Liu, S., Tao, Y., Meng, Z., Fu, Y., & Zang, K. (2011). The effect of calciums on molecular motions of proteinase K. Journal of Molecular Modelling, 17, 289–300. Machado, S. G., Baglinière, F., Marchand, S., Van Coillie, E., Vanetti, M. C. D., De Block, J., & Heyndrickx, M. (2017). The biodiversity of the microbiota producing heat-resistant enzymes responsible for spoilage in processed bovine milk and dairy products. Frontiers in Microbiology, 8, 1–22. Marchand, S., Duquenne, B., Heyndrickx, M., Coudijzer, K., & De Block, J. (2017). Destabilization and off-flavors generated by Pseudomonas proteases during or after UHT-processing of milk. International Journal of Food Contamination, 4, 1–7. Marchler-Bauer, A., Bo, Y., Han, L., He, J., Lanczycki, C. J., Lu, S., ... Bryant, S. H. (2017). CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic Acids Research, 45(D1), D200–D203. McPhalen, C. A., Strynadka, N. C. J., & James, M. N. G. (1991). Calcium-binding sites in proteins: A structural perspective. Advances in Protein Chemistry, 42, 77–82. Ogle, J. M., Clifton, I. J., Rutledge, P. J., Elkins, J. M., Burzlaff, N. I., Adlington, M. R., Roach, P. L., & Baldwin, J. E. (2001). Alternative oxidation by isopenicillin N synthase observed by X-ray diffraction. Chemical Biology, 8, 1231. Paramo, T., East, A., Garzón, D., Ulmschneider, M. B., & Bond, P. J. (2014). Efficient characterization of protein cavities within molecular simulation trajectories: TRJ_cavity. Journal of Chemical Theory and Computation, 10(5), 2151–2164. Parrinello, M., & Rahman, A. (1981). Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied Physics, 52(12), 7182–7190. Schneider, L. K., & Einsle, O. (2016). Role of calcium in secondary structure stabilization during maturation of nitrous oxide reductase. Biochemistry, 55, 1433–1440. Strynadka, N. C. J., & James, M. N. G. (1991). Towards an understanding of the effects of calcium on protein structure and function. Current Opinion in Structural Biology, 1, 905–914. Webb, B., & Sali, A. (2016). Comparative protein structure modeling using MODELLER. Current Protocols in Protein Science, 86(1) 2.9.1-2.9.37. Whelan, S., & Goldman, N. (2001). A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molecular Biology and Evolution, 18(5), 691–699. Wu, D. X., Ran, T. T., Wang, W. W., & Xu, D. Q. (2016). Structure of a thermostable serralysin from Serratia sp. FS14 at 1.1 angstrom resolution. Acta Crystallographica, Section F, 72, 10–15. Zhang, S. C., Sun, M., Li, T., Wang, Q. H., Hao, J. H., Han, Y., ... Lin, S. X. (2011). Structure analysis of a new psychrophilic marine protease. PLoS One, 6, e26939. 315 ANEXO B – Scripts 245 B.1 CSVMaker O CSVmaker é um algoritmo necessário para a execução do método Least Square Fit e, portanto, deve ser executado previamente. Ele cria arquivos do tipo CSV à partir de arquivos MOL2 com cargas obtidas por métodos quânticos, os quais contém as informações de limites superiores e inferiores que devem ser determinados pelo usuário. Para rodar o algoritmo, use: python CSVmaker.py arquivo.mol2 import sys import re arq = sys.argv[1] output = arq[0:3]+ ".csv" print output # Grepping coordenates and charges def convert(arq, output): f = open(arq, "r") l = "text" f_out = open(output, "w") for i in range(7): l = f.readline() #acessing the atoms while True: l = f.readline() if not l: break if l.split()[0] == ’@BOND’: break ## nao consegui fazer ele terminar o readline sem ler o @BOND. Ai fiz essa gambiarra. if len(l) > 0: atom_number = str(l.split()[0]) + "," atom_code = str(l.split()[1]) + "," atom_fixed = "0," atom_X = str(l.split()[2]) + "," atom_Y = str(l.split()[3]) + "," atom_Z = str(l.split()[4]) + "," atom_charge = str(l.split()[8]) + "," range_low = "-1.0," range_high = "1.0" colN = atom_number.ljust(15, " ") colC = atom_code.ljust(15, " ") colF = atom_fixed.ljust(15, " ") col1 = atom_X.ljust(15, " ") col2 = atom_Y.ljust(15, " ") col3 = atom_Z.ljust(15, " ") col4 = atom_charge.ljust(15, " ") col5 = range_low.ljust(15, " ") col6 = range_high.ljust(15, " ") f_out.write(colN) f_out.write(colC) f_out.write(colF) f_out.write(col1) f_out.write(col2) ANEXO B. Scripts f_out.write(col3) f_out.write(col4) f_out.write(col5) f_out.write(col6) f_out.write("\n") f.close() f_out.close() convert(arq,output) 246 ANEXO B. Scripts 247 B.2 Least Square Fit O LSQLfit é um algoritmo que utiliza dos limites superiores e inferiores propostos pelo usuário para encontrar soluções de cargas atômicas parciais que conservem a direção e o sentido do vetor momento de dipolo obtido por métodos quânticos. Ele também faz uso de um arquivo MOL2 com as cargas atômicas parciais obtidas por método quânticos e permite que o usuário defina um coeficiente Q que module a magnitude do vetor momento de dipolo final e que também defina a carga residual final C encontrada pelo algoritmo. Para rodar o algoritmo, use: python LSQLfit.py arquivo.csv ESP.mol2 Q C #Bruno Iochins Grisci e Marcelo Depolo Poleto #### 18-APRIL-2017 # Usage: python getvector.py file1.csv file2-esp.mol2 1.0 0.0 # Usage: python getvector.py file1.csv file2-esp.mol2 module_coeficient (1.0 is default) molecule_charge (0.0 is default) import sys import numpy as np from numpy import genfromtxt from scipy.optimize import lsq_linear ’’’It reads the .csv file, it should follow this format for each text line: index, atom_name, fixed_flag, x, y, z, ref_charge, charge_lower_bound, charge_upper_bound And each line represents an atom.’’’ csv_name = sys.argv[1] mol2_name = sys.argv[2] coef = float(sys.argv[3]) mol_charge = float(sys.argv[4]) esp = genfromtxt(csv_name, delimiter=’,’) ’’’POS is the matrix of atoms positions X, Y, Z, it is transposed in order to convert the .mol2 (and .csv) notation to the shape needed for the linear problem. mol2 linear problem X0 Y0 Z0 X0 X1 X2 ... Xn X1 Y1 Z1 -> Y0 Y1 Y2 ... Yn X2 Y2 Z2 Z0 Z1 Z2 ... Zn ... Xn Yn Zn ’’’ POS = np.transpose(esp[:,3:6]) C_ref = esp[:,-3] C_lb = esp[:,-2] C_up = esp[:,-1] ’’’The fixed flag: 0 if not fixed, 1 if fixed Atoms marked as fixed (= 1) will have their charges fixed to the value of their lower bound charges. For fixed atoms the values of the lower bound and upper bound charge should be the same.’’’ fixed = esp[:,2] ’’’If the lower and upper bounds for a charge are equal, i. e., that charge value should be constant, a small value is added to the upper charge so the algorithm can run.’’’ for bound in xrange(C_lb.size): if C_lb[bound] == C_up[bound]: C_up[bound] = C_up[bound] + 0.000001 K_ref = np.dot(POS, C_ref) ’’’In order to satisfy the condition that the sum of all charges must be = mol_charge, ANEXO B. Scripts 248 a row of coeficients all equal 1.0 is appended at the end of the POS matrix and the value mol_charge is appended at the end of the vector of charges references. This way, when solving the linear problem, one of the equations will be 1 * c0 + 1 * c1 + ... + 1 * cn = mol_charge ’’’ #Creates a new vector with 1.0 in all positions and add it as last row in the POS matrix one_row = np.zeros(shape=(1,POS.shape[1])) one_row.fill(1.0) POS = np.vstack([POS, one_row]) #Add mol_charge at the end of the reference vector for charges K_ref = np.append(K_ref, [mol_charge]) ’’’For each atom marked with the fixed flag = 1, a row of coeficients all equal 0.0 is appended at the end of the POS matrix, except for the coeficient of the marked atom, that is equal 1.0, and the value of its lower bound charge is appended at the end of the vector of charges references. This way, when solving the linear problem, one of the equations will be 0 * c0 + 0 * c1 + ... + 1 * cmarked + ... 0 * cn = lb_charge ’’’ for flag in xrange(fixed.size): if fixed[flag] == 1: #print(flag) #print(fixed[flag]) #print(C_lb[flag]) #Creates a new vector with 0.0 in all positions except the position of #the flag = 1 and adds it as last row in the POS matrix new_row = np.zeros(shape=(1,POS.shape[1])) new_row[:,flag] = 1.0 POS = np.vstack([POS, new_row]) #Add the lower bound charge of the marked atom at the end of the #reference vector for charges K_ref = np.append(K_ref, [C_lb[flag]]) ’’’Solving the linear problem POS * C_pred = K_ref We want to know the vector C_pred of new charge values that preserves the module and orientation of vector K_ref, and also obeys the constraints of lower and upper bounds for each charge and keeps the sum of all charges = mol_charge’’’ ’’’We also added a coefficient to multiply the K_ref in cases we want a slightly different vector module but still the same direction’’’ K_ref = K_ref * coef C_pred = lsq_linear(POS, K_ref, bounds=(C_lb, C_up), lsmr_tol=’auto’, verbose=1) #print(C_pred) #C_pred = np.round(C_pred[’x’], 5) C_pred = C_pred[’x’] print(’Check if charges respect constraints:’) for i in xrange(len(C_pred)): if C_pred[i] >= C_lb[i] and C_pred[i] <= C_up[i]: print(str(C_pred[i]) + ’ OK!’) else: print(str(C_pred[i]) + ’ outside ’ + str(C_lb[i], C_up[i])) print(’Charges: \n’ + str(C_pred)) print(’Molecule total charge: ’ + str(sum(C_pred))) ’’’The next lines read a .mol2 file and change its charges to the new charges in C_pred, saving the new results in a *-lsql.mol2 file’’’ in_f = open(mol2_name, ’r’) out_f = open(mol2_name.replace(’esp’, ’lsql’), ’w’) # Copy the @MOLECULE section for i in xrange(7): #acessing the atoms l = in_f.readline() out_f.write(l) # Copy the @ATOM section changing the charge value for c in C_pred: l = in_f.readline() l = list(l) str_c = str(c) if c >= 0.0: str_c = ’ ’ + str_c l[-12:-2] = str_c ANEXO B. Scripts out_f.write(’’.join(l)) # Copy the @BOND section l = in_f.readline() while l: #acessing the atoms out_f.write(l) l = in_f.readline() in_f.close() out_f.close() 249 ANEXO B. Scripts 250 B.3 ConfID A ferramenta faz uso do conceito de que cada conformação molecular é resultado de determinados ângulos diedrais e, portanto, a combinação de diferentes ângulos diedrais produzem as populações conformacionais possíveis. Assim, o algoritmo se utiliza das curvas de distribuição diedral das torções de uma molécula obtida ao longo de uma simulação e identifica o número de populações em cada diedro. Consequentemente, a combinação das diferentes populações diedrais resultam nas populações conformacionais atingidas em solução. Após, a ferramenta varre arquivos de valores de cada diedro em função do tempo e contabiliza a abundância relativa de cada população conformacional utilizando o número de frames em que os ângulos de todos os diedros analisados fazem parte de uma determinada população conformacional. Para rodar a ferramenta, é preciso de um arquivo input.inp, no qual deve conter: DIH1.dist.xvg, DIH1.aver.xvg DIH2.dist.xvg, DIH2.aver.xvg DIH3.dist.xvg, DIH3.aver.xvg ... Sendo DIHx.dist.xvg o arquivo contendo a distribuição diedral do diedro x e DIHx.aver.xvg é o arquivo contendo os valores do ângulo diedral x em função do tempo. #Bruno Iochins Grisci e Marcelo Depolo Poleto #FEBRUARY/2018 import os import sys import pprint import numpy as np from collections import Counter from populations import pops class region: def __init__(self, reg, points_file): self.reg = reg self.points_file = points_file self.points = [] def get_count(self): return len(self.points) def get_freq(self): total_points = 0.0 with open(self.points_file, ’r’) as pf: for line in pf: if ’#’ in line or ’@’ in line: pass else: total_points += 1.0 return float(len(self.points)/total_points) def get_mean(self): m = np.mean(np.array(self.points)) if m > 180.0: m = m - 360.0 return m def get_std(self): return np.std(np.array(self.points)) def __repr__(self): return ’{:24s} # count: {:6d} # freq: {:6.3f} # mean: {:8.3f} # std: {:6.3f}’. format(self.reg, self.get_count(), round(self.get_freq(), 3), round(self. get_mean(), 3), round(self.get_std(), 3)) ANEXO B. Scripts if __name__ == ’__main__’: input_files = sys.argv[1] alias = os.path.basename(input_files).replace(’.inp’, ’’) DATA = {} ANGLES = [] TIMES = [] with open(input_files, ’r’) as infs: for line in infs: files = line.split(’,’) files[0] = files[0].rstrip().replace(’ ’, ’’) files[1] = files[1].rstrip().replace(’ ’, ’’) p = pops(files[0]) rs = [] for r in p.regions: rs.append(region(r, files[1])) DATA[files[0]] = rs keys = DATA.keys() keys.sort() for k in keys: with open(DATA[k][0].points_file, ’r’) as points: print(k, DATA[k][0].points_file) a = [] t = [] for line in points: if ’#’ in line or ’@’ in line: pass else: angle = float(line.split()[1]) time = float(line.split()[0]) t.append(time) found = False for r in DATA[k]: if len(r.reg) == 2: if angle > r.reg[0] and angle <= r.reg[1]: r.points.append(angle) a.append(r.reg) found = True elif len(r.reg) == 4: if angle > r.reg[0] and angle <= r.reg[1]: r.points.append(angle+360.0) a.append(r.reg) found = True elif angle > r.reg[2] and angle <= r.reg[3]: r.points.append(angle) a.append(r.reg) found = True else: print(’ERROR’) if not found: a.append(’z’) ANGLES.append(a) TIMES.append(t) for z in zip(ANGLES, TIMES): print(len(z[0]), len(z[1])) count = zip(*ANGLES) counter = Counter(count) regions_times = {} pp = pprint.PrettyPrinter(indent=4) pp.pprint(DATA) with open(alias+’_REGIONS.txt’, ’w’) as reg_file: pprint.pprint(DATA, stream=reg_file) with open(alias+’_TOP.txt’, ’w’) as top_file: total_points = 0 total_freq = 0.0 251 ANEXO B. Scripts 252 top_file.write(’Most common:’) print(’Most common:’) for rs, cs in counter.most_common(): regions_times[rs] = [] freq = round(float(cs)/float(len(ANGLES[0])), 6) total_points += cs total_freq += freq top_file.write(’\n{:135s}: {:6f} ({:6d})’.format(rs, freq, cs)) print(’{:135s}: {:6f} ({:6d})’.format(rs, freq, cs)) top_file.write("\nTotal number of points: {:6f} ({:6d})".format(total_freq, total_points)) print("Total number of points: {:6f} ({:6d})".format(total_freq, total_points)) for i in xrange(len(count)): if count[i] in regions_times.keys(): regions_times[count[i]].append(TIMES[0][i]) tt = 0 for k in regions_times: tt += len(regions_times[k]) print(tt) i=0 for rs, cs in counter.most_common(): with open(alias+’_’+str(i)+’.txt’, ’w’) as time_file: time_file.write(str(rs)) for t in regions_times[rs]: time_file.write(’\n’ + str(t)) i += 1 ANEXO B. Scripts 253 O script populations.py (abaixo) é executado pelo script mestre ConfID.py. Por isso, é preciso possuir ambos no diretório de trabalho. #Bruno Iochins Grisci e Marcelo Depolo Poleto #FEBRUARY/2018 import sys import numpy as np import re class pops: def __init__(self, data_file_name): self.data_file_name = data_file_name angles = [] dists = [] with open(self.data_file_name, ’r’) as df: for line in df: if ’#’ in line or ’@’ in line or ’-180’ in line: pass else: l = line.split() angles.append(int(l[0])) dists.append(float(l[1])) angles = np.array(angles) dists = np.array(dists) min_value_index = np.argmin(dists) translation = np.array([360]*min_value_index + [0]*(angles.size - min_value_index)) angles = angles + translation sorter = np.argsort(angles) s = self.smooth(dists[sorter]) peaks = self.find_peaks(s) pk = np.zeros(angles.size) for i in peaks: pk[i] = s[i] valleys = self.find_valleys(s) vl = np.zeros(angles.size) for i in valleys: vl[i] = np.max(s)/4.0 self.regions = self.get_regions(peaks, valleys, angles[sorter]) self.save(self.data_file_name.replace(’.xvg’, ’shift’), angles[sorter], dists[ sorter]) self.save(self.data_file_name.replace(’.xvg’, ’smooth’), angles[sorter], s) self.save(self.data_file_name.replace(’.xvg’, ’peaks’), angles[sorter], pk) self.save(self.data_file_name.replace(’.xvg’, ’valleys’), angles[sorter], vl) def smooth(self, x, window_len=21, window=’hanning’): if x.ndim != 1: raise ValueError, "smooth only accepts 1 dimension arrays." if x.size < window_len: raise ValueError, "Input vector needs to be bigger than window size." if window_len<3: return x if not window in [’flat’, ’hanning’, ’hamming’, ’bartlett’, ’blackman’]: raise ValueError, "Window is on of ’flat’, ’hanning’, ’hamming’, ’bartlett’, ’ blackman’" s=np.r_[x[window_len-1:0:-1],x,x[-2:-window_len-1:-1]] #print(len(s)) if window == ’flat’: #moving average w=np.ones(window_len,’d’) else: w=eval(’np.’+window+’(window_len)’) y=np.convolve(w/w.sum(),s,mode=’valid’) return y[10:-10] def find_peaks(self, distribution): peaks = [] ANEXO B. Scripts 254 for i in xrange(1, distribution.size-1): if (distribution[i] > distribution[i-1] and distribution[i] > distribution[i+1] and distribution[i] >= (np.max(distribution)/20.0)): peaks.append(i) return peaks def find_valleys(self, distribution): valleys = [] for i in xrange(1, distribution.size-1): if (distribution[i] <= distribution[i-1] and distribution[i] <= distribution[i +1]) or distribution[i] <= np.max(distribution)/300.0: valleys.append(i) return valleys def get_regions(self, peaks, valleys, angles): regions = [] for p in peaks: start = max([v for v in valleys if v < p]) end = min([v for v in valleys if v > p]) if angles[start] <= 180 and angles[end] <= 180: regions.append((angles[start], angles[end])) if angles[start] > 180 and angles[end] > 180: regions.append((angles[start]-360, angles[end]-360)) if angles[start] <= 180 and angles[end] > 180: regions.append((-180, angles[end]-360, angles[start], 180)) return regions def save(self, file_name, angles, dists): with open(file_name+’.xvg’, ’w’) as of: of.write(re.sub(r’ *\n *’, ’\n’, np.array_str(np.c_[angles, dists]).replace (’[’, ’’).replace(’]’, ’’).strip())) ANEXO B. Scripts 255 B.4 VirtualTrajMaker O script MakeVirtualTraj.py abaixo é útil para gerar uma subtrajetória utilizando somente os frames de uma determinada conformação populacional identificada pelo ConfID.py. Assim, é possível investigar mais à fundo algumas informações químicas de ordem estrutural, como interações intramoleculares de uma determinada conformação ou com o solvente. ###################################### READ ME ####### # Usage: python make_virtual_traj.py file.tpr file.xtc ascendent_list_of_frames.txt # # Remember: longer the trajectory, longer the time to run this. Good luck! # ps: use a clean folder to run this script. Be organised! # ps: DO NOT use a trajectory with a number that cannot be divided by 10. # # marcelodepolo@gmail.com # ######################################### # Importing libraries import os import sys import numpy ######################################### # Defining arguments ver = ’514’ tprfile = sys.argv[1] trajfile = sys.argv[2] frame_input = sys.argv[3] ######################################## # Defining clustering parameters stime = 1000000 # simulation time in ps def read(file): f = open(file,"r") string = "text" string = f.readline() frame_list = [] while True: string = f.readline().strip(’ \t\n\r’) if string == ’’: break else: time = float(string.split()[0]) frame_list.append(time) return frame_list ######################################## # Chopping trajectory def chop_traj(ver): if not os.path.exists(’chop_traj/’): os.system(’mkdir chop_traj/’) else: pass chops = 10 # pra obter 10 trajetorias, cada um com 10% do tempo total for i in range(0,chops): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f ’ + trajfile + ’ -tu ps -b ’ + str(i*(stime/chops)) + ’ -e ’+str((i+1)*( stime/chops))+’ -o chop_traj/traj-’ + str((i+1)*(stime/chops)) +’.xtc’) ANEXO B. Scripts 256 ######################################## # Extracting frames from each dihedral def dump_frames(file): if not os.path.exists(’dump_frames/’): os.system(’mkdir dump_frames/’) else: pass frame_list = read(file) for j in frame_list: if not os.path.exists(’dump_frames/fr_’ + str(j) +’.xtc’): if 0 <= j <= (1*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-100000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (1*stime/10) < j <= (2*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-200000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (2*stime/10) < j <= (3*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-300000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (3*stime/10) < j <= (4*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-400000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (4*stime/10) < j <= (5*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-500000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (5*stime/10) < j <= (6*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-600000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (6*stime/10) < j <= (7*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-700000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (7*stime/10) < j <= (8*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-800000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (8*stime/10) < j <= (9*stime/10): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-900000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) elif (9*stime/10) < j <= (stime): os.system(’echo System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f chop_traj/traj-1000000.xtc -tu ps -dump ’ + str(j) + ’ -o dump_frames/fr_’ + str(j) +’.xtc’) else: pass ######################################## # Concatenating frames from each dihedral def concat_frames(file): frame_list = read(file) if not os.path.exists(’temp_trajs/’): os.system(’mkdir temp_trajs/’) else: pass ################################################################ # Creating partial trajectories os.chdir(’dump_frames/’) nfr = len(frame_list) ANEXO B. Scripts 257 fn = nfr/1000 for k in range(fn): f = open("frames_list-" + str(k) + ".txt", ’w’) c = open("cat_list-" + str(k) + ".txt", ’w’) f.close() c.close() n=0 k=0 for j in frame_list: if n >= 1000: f.close() c.close() k += 1 n=1 else: f = open("frames_list-" + str(k) + ".txt", ’a’) c = open("cat_list-" + str(k) + ".txt", ’a’) string0 = "fr_" + str(j) + ".xtc " string1 = "c\n" f.write(string0) c.write(string1) n += 1 f.close() c.close() for k in range(fn+1): f = open("frames_list-" + str(k) + ".txt", ’r’) l = f.readline() if len(frame_list) > 1: os.system(’gmx_’ + ver + ’ trjcat -f ’ + str(l) + ’ -settime -o ../ temp_trajs/traj-’ + str(k) + ’.xtc < cat_list-’ + str(k) + ’. txt’) pass else: pass os.chdir(’../’) ################################################################ # Concatenating partial trajectories if not os.path.exists(’final_trajs/’): os.system(’mkdir final_trajs/’) else: pass os.chdir(’temp_trajs/’) f = open("temp-fr.txt", ’w’) c = open("temp-cat.txt", ’w’) for a in range(fn+1): stringA = ’traj-’ + str(a) + ’.xtc ’ stringB = ’c\n’ f.write(stringA) c.write(stringB) f.close() c.close() f = open("temp-fr.txt", ’r’) l = f.readline() if len(frame_list) > 1: os.system(’gmx_’ + ver + ’ trjcat -f ’ + str(l) + ’ -settime -o ../ final_trajs/traj-final_temp.xtc < temp-cat.txt’) ANEXO B. Scripts 258 else: pass print ’\n##############################\n>>> We could not read the frame list to concatenate the frames. Check your input!\n’ os.chdir(’../’) return ######################################## # Reset folders def change_timestep(file): frame_list = read(file) t0 = frame_list[0] t1 = frame_list[1] dt = int(t1 - t0) os.system(’echo Other Other System | gmx_’ + ver + ’ trjconv -s ’ + tprfile + ’ -f final_trajs/traj-final_temp.xtc -tu ps -center -fit rot+trans -timestep ’ + str (dt) + ’ -o final_trajs/traj-final.xtc’) ######################################## # Reset folders def reset(): if os.path.exists(’final_trajs/’): os.system(’rm -r final_trajs/’) else: pass if os.path.exists(’temp_trajs/’): os.system(’rm -r temp_trajs/’) else: pass if os.path.exists(’dump_frames/’): os.system(’rm -r dump_frames/’) else: pass if os.path.exists(’chop_traj/’): os.system(’rm -r chop_traj/’) else: pass def main(ver,frame_input): reset() frame_list = read(frame_input) chop_traj(ver) dump_frames(frame_input) concat_frames(frame_input) change_timestep(frame_input) print "\n>>> Done! Have a nice day!\n" print main(ver,frame_input) CURRICULUM VITÆ Curriculum Vitæ Formação acadêmica: 2009–2015 Bacharel em Bioquímica, Universidade Federal de Viçosa (UFV), Viçosa - MG. 2015–2016 Mestrado em Biologia Celular e Molecular, Centro de Biotecnologia - UFRGS, Porto Alegre. 2016–Atual Doutorado em Biologia Celular e Molecular, Centro de Biotecnologia - UFRGS, Porto Alegre. Em andamento Trabalhos científicos apresentados em congressos: POLÊTO, M. D.; VERLI, H. . Conformational dynamics and interaction network of bioactive molecules in aqueous solution. In: 47a Reunião Anual da Sociedade Brasileira de BIoquímica e Biologia Celular, 2018, Joinville. Programa e Resumos da 47a Reunião Anual da Sociedade Brasileira de BIoquímica e Biologia Celular, 2018. POLÊTO, M. D.; VERLI, H. . Parametrization of small molecules towards ligand-receptor complexation dynamics. In: 46a Reunião Anual da Sociedade Brasileira de BIoquímica e Biologia Celular, 2017, Águas de Lindóia. Programa e Resumos da 46a Reunião Anual da Sociedade Brasileira de BIoquímica e Biologia Celular, 2017. POLÊTO, M. D.; VERLI, H. . smGROMOS: a force field for drug design. In: 45a Reunião Anual da Sociedade Brasileira de BIoquímica e Biologia Celular, 2016, Natal. Programa e Resumos da 45a Reunião Anual da Sociedade Brasileira de BIoquímica e Biologia Celular, 2016. POLÊTO, M. D.; VERLI, H. . Potential use of GROMOS force field parameters for organic molecules. In: Escola Gaúcha de Bioinformática, 2015, Porto Alegre. Escola Gaúcha de Bioinformática, 2015. POLÊTO, M. D.; TEIXEIRA, J. A. ; FIETTO, J.L.R. ; BRESSAN, G.C. ; JUNIOR, A.S. ; ALMEIDA, M.R. . In silico analysis of Porcine circovirus 2 capside protein: genetic diversity associated with cell adhesion and viral stability. In: 23rd Congress of the International Union of Biochemistry and Molecular Biology, 2015, Foz do Iguaçu. 23rd Congress of the International Union of Biochemistry and Molecular Biology, 2015. Publicações em periódicos especializados: POLÊTO, MARCELO D.; ALVES, MAURA P.; LIGABUE-BRAUN, RODRIGO; ELLER, MONIQUE R.; DE CARVALHO, ANTÔNIO F. Role of structural ions on the dynamics of the pseudomonas fluorescens 07a metalloprotease. Food Chemistry, v. x, p. x-x, 2019. ARANTES, PABLO; POLÊTO, MARCELO D.; JOHN, ELISA B.; PEDEBOS, CONRADO ; GRISCI, BRUNO I; DORN, MARCIO; VERLI, HUGO. Development of GROMOS-Compatible Parameter Set for Simulations of Chalcones and Flavonoids. Journal of Physical-Chemistry B, v. x, p. x-x, 2019. TESCH, ROBERTA; BECKER, CHRISTIAN; MÜLLER, MATTHIAS P.; BECK, MICHAEL E.; QUAMBUSCH, LENA ; GETLIK, MATTHÄUS ; LATEGAHN, JONAS ; UHLENBROCK, NIKLAS ; COSTA, FANNY N.; POLÊTO, MARCELO D.; DE SENA MURTEIRA PINHEIRO, PEDRO ; RODRIGUES, DANIEL A.; SANT’ANNA, CARLOS M.; FERREIRA, FABIO F.; VERLI, HUGO; FRAGA, CARLOS ALBERTO M.; RAUH, DANIEL. An Unusual Intramolecular Halogen Bond guides Conformational Selection. Angewandte Chemie-Internacional Edition, v. 57(31), p. 9970-9975, 2018. POLÊTO, MARCELO D.; RUSU, VICTOR H. ; GRISCI, BRUNO I. ; DORN, MARCIO; LINS, ROBERTO D.; VERLI, HUGO. Aromatic Rings Commonly Used in Medicinal Chemistry: Force Fields Comparison and Interactions With Water Toward the Design of New Chemical Entities. Frontiers in Pharmacology , v. 9, p. 395, 2018. FISCHER, NINA M; POLÊTO, MARCELO D.; STEUER, JAKOB ; VAN DER SPOEL, DAVID . Influence of Na+ and Mg2+ ions on RNA structures studied with molecular dynamics simulations. Nucleic Acids Research (Online) , v. 46(10), p. 4872-4882, 2018 LIMA, RAYANE N.; FAHEEM, MUHAMMAD; BARBOSA, JOÃO A. R. G.; POLÊTO, MARCELO D.; VERLI, HUGO; MELO, FERNANDO L.; RESENDE, RENATO O. Homology modeling and molecular dynamics provide structural insights into tospovirus nucleoprotein. BMC Bioinformatics , v. 17, p. 11-17, 2016. GUIMARÃES-PEIXOTO, RAFAELLA P. M.; PINTO, PAULO S. A.; SANTOS, M. R.; POLÊTO, MARCELO D.; SILVA, LETÍCIA F.; SILVA-JÚNIOR, ABELARDO. Evaluation of a synthetic peptide from the Taenia saginata 18kDa surface/secreted oncospheral adhesion protein for serological diagnosis of bovine cysticercosis. Acta Tropica , v. 164, p. 463-468, 2016. FIGUEIRA, FLÁVIO ; FARINHA, ANDREIA S. F. ; MUTETO, PAULINO V. ; POLÊTO, MARCELO D. ; VERLI, HUGO ; GOMES, M. TERESA S. R. ; TOMÉ, AUGUSTO C. ; CAVALEIRO, JOSÉ A. S. ; TOMÉ, JOÃO P. C. . [28]Hexaphyrin derivatives for anion recognition in organic and aqueous media. Chemical Communications (London. 1996. Print), v. 52, p. 2181-2184, 2016. AMOS, CATARINA IV ; FIGUEIRA, FLÁVIO ; POLÊTO, MARCELO D. ; AMADO, FRANCISCO ML ; VERLI, HUGO ; TOMÉ, JOÃO PC ; NEVES, M GRAÇA PMS . ESI-MS/MS of expanded porphyrins: a look into their structure and aromaticity. Journal of Mass Spectrometry (Print), v. 51, p. 342-349, 2016. SIQUEIRA, RAONI PAIS ; BARBOSA, ÉVERTON DE ALMEIDA ALVES ; POLÊTO, MARCELO DEPÓLO ; RIGHETTO, GERMANNA LIMA ; SERAPHIM, THIAGO VARGAS ; SALGADO, RAFAEL LOCATELLI ; FERREIRA, JOANA GASPERAZZO ; BARROS, MARCUS VINÍCIUS DE ANDRADE ; DE OLIVEIRA, LEANDRO LICURSI ; LARANJEIRA, ANGELO BRUNELLI ALBERTONI ; ALMEIDA, MÁRCIA ROGÉRIA ; JÚNIOR, ABELARDO SILVA ; FIETTO, JULIANA LOPES RANGEL ; KOBARG, JÖRG ; DE OLIVEIRA, EDUARDO BASÍLIO ; TEIXEIRA, ROBSON RICARDO ; BORGES, JÚLIO CÉSAR ; YUNES, JOSE ANDRÉS ; BRESSAN, GUSTAVO COSTA . Potential Antileukemia Effect and Structural Analyses of SRPK Inhibition by N-(2-(Piperidin-1-yl)-5-(Trifluoromethyl)Phenyl)Isonicotinamide (SRPIN340). Plos One, v. 10, p. e0134882, 2015. SALGADO, RAFAEL LOCATELLI ; VIDIGAL, PEDRO MARCUS PEREIRA ; GONZAGA, NATALIA F. ; DE SOUZA, LUIZ F. L. ; POLÊTO, MARCELO D. ; ONOFRE, THIAGO SOUZA ; ELLER, MONIQUE R. ; PEREIRA, CARLOS EDUARDO REAL ; FIETTO, JULIANA L. R. ; BRESSAN, GUSTAVO C. ; GUEDES, ROBERTO M. C. ; ALMEIDA, MÁRCIA R. ; SILVA JÚNIOR, ABELARDO . A porcine circovirus-2 mutant isolated in Brazil contains low-frequency substitutions in regions of immunoprotective epitopes in the capsid protein. Archives of Virology, v. 160 (11), p. 2741-2748, 2015. Bolsas concedidas 2010–2011 Bolsista FAPEMIG, Laboratório de Infectologia Molecular Animal - LIMA, UFV. Avaliação da diversidade genética do PCV-2 no Brasil e estudo da conservação de regiões imunogênicas para a produção de candidatos vacinais 2011–2012 Bolsista FAPEMIG, Laboratório de Infectologia Molecular Animal - LIMA, UFV. Estudo da diversidade genética dos isolados de Infectious Bursal Disease Virus (IBDV) circulantes 2012–2013 Bolsista FAPEMIG, Laboratório de Infectologia Molecular Animal - LIMA, UFV. Análise molecular da interação das variantes virais do Porcine Circovirus 2 com receptores glicosaminoglicanos e proposição de possíveis inibidores. 2013–2014 Bolsista Ciência sem Fronteiras, David van der Spoel’s Lab - Uppsala Universitet, UU. Ionic dependence in RNA dynamics. 2015–2016 Bolsista Mestrado, Grupo de Bioinformática Estrutural - GBE, UFRGS. Parametrização de anéis aromáticos comumente usados no desenvolvimento de fármacos e química medicinal. Aprovações em Concursos Aprovado em 5o lugar no Concurso Público 67/2018 - Área/Subárea: Bioquímica e Biologia Molecular/ Bioquímica e Biologia Molecular, promovido pelo Departamento de Bioquímica e Biologia Molecular da Universidade Federal de Viçosa (UFV).