Activation addition: Steering language models without optimization AM Turner, L Thiergart, G Leech, D Udell, JJ Vazquez, U Mini, ... arXiv e-prints, arXiv: 2308.10248, 2023 | 130 | 2023 |
Optimal Policies Tend to Seek Power AM Turner, L Smith, R Shah, P Tadepalli Thirty-Fifth Conference on Neural Information Processing Systems, 2021 | 92 | 2021 |
Steering llama 2 via contrastive activation addition N Panickssery, N Gabrieli, J Schulz, M Tong, E Hubinger, AM Turner arXiv preprint arXiv:2312.06681, 2023 | 81 | 2023 |
Conservative agency via attainable utility preservation AM Turner, D Hadfield-Menell, P Tadepalli Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 385-391, 2020 | 67 | 2020 |
Avoiding side effects in complex environments AM Turner, N Ratzlaff, P Tadepalli Advances in Neural Information Processing Systems, 2020 | 49 | 2020 |
Parametrically retargetable decision-makers tend to seek power A Turner, P Tadepalli Advances in Neural Information Processing Systems 35, 31391-31401, 2022 | 23 | 2022 |
Controlled motion with the XL-TDR lateral-approach lumbar total disk replacement: in vitro kinematic investigation L Pimenta, A Turner, L Oliveira, L Marchi, B Cornwall Journal of Neurological Surgery Part A: Central European Neurosurgery 76 (02 …, 2015 | 8 | 2015 |
Understanding and Controlling a Maze-Solving Policy Network U Mini, P Grietzer, M Sharma, A Meek, M MacDiarmid, AM Turner arXiv preprint arXiv:2310.08043, 2023 | 6 | 2023 |
On avoiding power-seeking by artificial intelligence AM Turner arXiv preprint arXiv:2206.11831, 2022 | 3 | 2022 |
Formalizing the problem of side effect regularization AM Turner, A Saxena, P Tadepalli arXiv preprint arXiv:2206.11812, 2022 | 2 | 2022 |