Centre for Creative Arts and Technologies Department of Music and Media, Faculty of Arts and Social Sciences, University of Surrey
Biography
Andrew Gilbert is an Associate Professor at the University of Surrey. His academic pursuits are primarily focused on video understanding and Generative Models. His research portfolio comprises over 65 articles published in the leading international vision conferences and journals, and he co-leads the C-CATS research group at Surrey.
Dr Gilbert’s extensive research ranges from intelligent creative arts, such as fine-grained style search, movie trailer genre understanding, and 4D performance capture), to enable computers to perceive and understand their complex and cluttered surroundings using multiple training techniques, including self-supervised and multiple data modes. Previously, his research encompassed 3D human pose estimation and complex real-world activity recognition, with early work on tracking people on vast surveillance networks.
Moreover, Dr. Gilbert is an active British Machine Vision Association (BMVA) (BMVA) Executive Committee member and coordinates the national BMVA technical meetings. These meetings offer a forum for key experts from industry and academia to discuss and identify solutions to current problems in specialist areas of computer vision and machine learning.
PhD Students
I’m always looking for good PhD candidates, but normally, when I have funding available, it is for UK students. This is sometimes due to restrictions on the funding, but more often than not, it has more to do with the difference between home and overseas fees. However, if you are an exceptional candidate, email me. I am happy to work with high-calibre candidates to obtain scholarships either internally or from external funders such as the China Scholarship Council or the Commonwealth Scholarship Council.
Current PhD Students
- Adrienne Deganutti - Long Term Video Captioning 2023
- Xu Dong - Group Activity Recognition in Video 2023
- Mona Ahmadian - Self-Supervised Audio Visual Video Understanding 2022
- Soon Yau - Text to Storyboard generation - 2021
- Kar Balan - Decentralized virtual content Understanding for Blockchain- 2021
- Katharina Botel-Azzinnaro - Real stories through faked media: Non-fiction in AI-generated environments 2021
- Tony Orme - Temporal Prediction of IP packets in network switches - 2019
Alumi
- Ed Fish - Film Trailer Genre Understanding and Exploration - 2019
- Dan Ruta - Exploring and understanding fine-grained style - 2019
- Violeta Menendez Gonzalez - Novel viewpoint and stereo inpainting - 2019
- Kary Ho 2019
- Mat Trumble - 2015
- Phil Krejov - 2012
- Segun Oshin - 2008
Gilbertine Weekend Away
Working together is cruical for my lab and PhD student, so we go on a yearly trip away
Research Updates
2024
Boosting Camera Motion Control for Video Diffusion Transformers
Soon Yau Cheong, Duygu Ceylan, Armin Mustafa, Andrew Gilbert, Chun-Hao Paul Huang, Arxiv Preprint 2410.10802, 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian, Frank Guerin, Andrew Gilbert, The 35th British Machine Vision Conference (BMVC’24) 2024
Interpretable Long-term Action Quality Assessment
Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye,Andrew Gilbert, The 35th British Machine Vision Conference (BMVC’24) (Oral) 2024
PDFed: Privacy-Preserving and Decentralized Asynchronous Federated Learning for Diffusion Models
Kar Balan, Andrew Gilbert, John Collomosse, Conference on Visual Media Production (CVMP’24) 2024
Detection and Re-Identification in the case of Horse Racing
Will Binning, Sadegh Rahmani, Xu Dong, Andrew Gilbert, Conference on Visual Media Production (CVMP’24) 2024
Thinking Outside the BBox: Unconstrained Generative Object Compositing
Gemma C Tarrés, Zhe Lin, Zhifei Zhang, Jianming Zhang, Yizhi Song, Dan Ruta, Andrew Gilbert, John Collomosse, Soo Ye Kim, European Conference on Computer Vision ECCV’24 2024
ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
Soon Cheong, Armin Mustafa, Andrew Gilbert, European Conference of Computer Vision 2024, FashionAI: Exploring the intersection of Fashion and Artificial Intelligence for reshaping the Industry, 2024
Diff-nst: Diffusion interleaving for deformable neural style transfer
Dan Ruta, Gemma C Tarrés, Andrew Gilbert, Eli Shechtman, Nick Kolkin, John Collomosse, European Conference of Computer Vision 2024, Vision for Art (VISART VII) Workshop, 2024
Aladin-nst: Self-supervised disentangled representation learning of artistic style through neural style transfer
Dan Ruta, Gemma Canet Tarres, Alexander Black, Andrew Gilbert, John Collomosse, European Conference of Computer Vision 2024, Vision for Art (VISART VII) Workshop, 2024
NeAT: Neural Artistic Tracing for Beautiful Style Transfer
Dan Ruta, Andrew Gilbert, John Collomosse, Eli Shechtman, Nicholas Kolkin, European Conference of Computer Vision 2024, Vision for Art (VISART VII) Workshop, 2024
Interpretable Action Recognition on Hard to Classify Actions
Anastasia Anichenko, Frank Guerin, and Andrew Gilbert, European Conference of Computer Vision 2024, Human-inspired Computer Vision Workshop, 2024
DEAR: Depth-Estimated Action Recognition
Sadegh Rahmani, Filip Rybansky, Quoc Vuong, Frank Guerin, Andrew Gilbert, European Conference of Computer Vision 2024, Human-inspired Computer Vision Workshop, 2024
Towards Rapid Elephant Flow Detection Using Time Series Prediction for OTT Streaming
Anthony Orme, Anthony Adeyemi-Ejeye and Gilbert, Andrew, 19th IEEE International Symposium on Broadband Multimedia Systems and Broadcasting BMSB 2024
PLOT-TAL–Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization
Ed Fish, Jon Weinbren, Andrew Gilbert, ArXiv abs/2403.18915, 2024
2023
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
Ed Fish, Jon Weinbren, Andrew Gilbert, NeurIPS 2023 Workshop on Machine Learning for Audio, 2023
MOFO: MOtion FOcused Self-Supervisionfor Video Understanding
Mona Ahmadian, Frank Guerin, Andrew Gilbert, NeurIPS 2023 Workshop Self-Supervised Learning: Theory and Practice, 2023
UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer
Soon Cheong, Armin Mustafa, Andrew Gilbert, ICCVWS’23 2nd computer vision for Metaverse workshop, 2023
DECORAIT - DECentralized Opt-in/out Registry for AI Training
Kar Balan, Alex Black, Simon Jenni, Andy Parsons, Andrew Gilbert, John Collomosse. The 20th ACM SIGGRAPH European Conference on Visual Media Production (CVMP’23), 2023 - Best Paper
Ekila: synthetic media provenance and attribution for generative art
Kar Balan, Shruti Agarwal, Simon Jenni, Andy Parsons, Andrew Gilbert, John Collomosse, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023
2022
SVS: Adversarial refinement for sparse novel view synthesis VM González, A Gilbert, G Phillipson, S Jolly, S Hadfield arXiv preprint arXiv:2211.07301 2 2022
Hypernst: Hyper-networks for neural style transfer D Ruta, A Gilbert, S Motiian, B Faieta, Z Lin, J Collomosse European Conference on Computer Vision, 201-217 5 2022
StyleBabel: artistic style tagging and captioning D Ruta, A Gilbert, P Aggarwal, N Marri, A Kale, J Briggs, C Speed, H Jin, … European Conference on Computer Vision, 219-236 9 2022
Two-Stream Transformer Architecture for Long Form Video Understanding E Fish, J Weinbren, A Gilbert British Machine Vision Conference (BMVC) 2022
SaiNet: Stereo aware inpainting behind objects with generative networks VM González, A Gilbert, G Phillipson, S Jolly, S Hadfield arXiv preprint arXiv:2205.07014 1 2022
Kpe: Keypoint pose encoding for transformer-based image generation SY Cheong, A Mustafa, A Gilbert arXiv preprint arXiv:2203.04907 5 2022
2021
Rethinking genre classification with fine grained semantic clustering
Ed Fish, Jon Weinbren, Andrew Gilbert, IEEE International Conference on Image Processing (ICIP), 2021
Neural architecture search for deep image prior K Ho, A Gilbert, H Jin, J Collomosse Computers & graphics 98, 188-196 37 2021
Human-like Relational Models for Activity Recognition in Video J Chrol-Cannon, A Gilbert, R Lazic, A Madhusoodanan, F Guerin arXiv preprint arXiv:2107.05319 2021
ALADIN: All Layer Adaptive Instance Normalization for Fine-grained Style Similarity D Ruta, S Motiian, B Faieta, Z Lin, H Jin, A Filipkowski, A Gilbert, … arXiv preprint arXiv:2103.09776 23 2021
2019
Semantic estimation of 3d body shape and pose using minimal cameras A Gilbert, M Trumble, A Hilton, J Collomosse arXiv preprint arXiv:1908.03030 1 2019
Fusing visual and inertial sensors with semantics for 3d human pose estimation A Gilbert, M Trumble, C Malleson, A Hilton, J Collomosse International Journal of Computer Vision 127, 381-397 66 2019
Automatic image annotation at ImageCLEF J Wang, A Gilbert, B Thomee, M Villegas Information Retrieval Evaluation in a Changing World: Lessons Learned from … 3 2019
2018
Inpainting of wide-baseline multiple viewpoint video A Gilbert, M Trumble, A Hilton, J Collomosse IEEE Transactions on Visualization and Computer Graphics 26 (7), 2417-2428 4 2018
Deep autoencoder for combined human pose estimation and body model upscaling M Trumble, A Gilbert, A Hilton, J Collomosse Proceedings of the European Conference on Computer Vision (ECCV), 784-800 65 2018
Volumetric performance capture from minimal camera viewpoints A Gilbert, M Volino, J Collomosse, A Hilton Proceedings of the European Conference on Computer Vision (ECCV), 566-581 60 2018
Disentangling structure and aesthetics for style-aware image completion A Gilbert, J Collomosse, H Jin, B Price Proceedings of the IEEE Conference on Computer Vision and Pattern … 15 2018
2017
Total capture: 3d human pose estimation fusing video and inertial sensors
M Trumble, A Gilbert, C Malleson, A Hilton, J Collomosse. Proceedings of 28th British Machine Vision Conference, 1-13 290 2017
Total capture: 3d human pose estimation fusing video and inertial sensors
Image and video mining through online learning A Gilbert, R Bowden Computer Vision and Image Understanding 158, 72-84 10 2017
Guided optimisation through classification and regression for hand pose estimation P Krejov, A Gilbert, R Bowden Computer Vision and Image Understanding 155, 124-138 35 2017
Real-time Full-Body Motion Capture from Video and IMUs C Malleson, A Gilbert, M Trumble, J Collomosse, A Hilton 92 2017
2016
Deep convolutional networks for marker-less human pose estimation from multiple views M Trumble, A Gilbert, A Hilton, J Collomosse Proceedings of the 13th European conference on visual media production (CVMP … 22 2016
Overview of the ImageCLEF 2016 Scalable Concept Image Annotation Task. A Gilbert, L Piras, J Wang, F Yan, A Ramisa, E Dellandrea, … CLEF (Working Notes), 254-278 24 2016
Learning Markerless human pose estimation from multiple viewpoint video M Trumble, A Gilbert, A Hilton, J Collomosse Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8 … 4 2016
Learning multi-class discriminative patterns using episode-trees RB Eng Jon Ong, Nicolas Pugeault, Andrew Gilbert IARIA 2016
2015
Geometric Mining: Scaling Geometric Hashing to Large Datasets A Gilbert, R Bowden 3rd Workshop on Web-scale Vision and Social Media (VSM), at ICCV 2015 2 2015
Overview of the ImageCLEF 2015 Scalable Image Annotation, Localization and Sentence Generation task. A Gilbert, L Piras, J Wang, F Yan, E Dellandrea, RJ Gaizauskas, … CLEF (Working Notes) 39 2015
Combining discriminative and model based approaches for hand pose estimation P Krejov, A Gilbert, R Bowden 2015 11th IEEE international conference and workshops on automatic face and … 34 2015
Data mining for action recognition A Gilbert, R Bowden Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision … 5 2015
General overview of ImageCLEF at the CLEF 2015 labs M Villegas, H Müller, A Gilbert, L Piras, J Wang, K Mikolajczyk, … Experimental IR Meets Multilinguality, Multimodality, and Interaction: 6th … 109 2015
2014
Capturing relative motion and finding modes for action recognition in the wild O Oshin, A Gilbert, R Bowden Computer Vision and Image Understanding 125, 155-171 17 2014
A multitouchless interface: expanding user interaction P Krejov, A Gilbert, R Bowden IEEE computer graphics and applications 34 (3), 40-48 8 2014
2012
Meeting in the Middle: A top-down and bottom-up approach to detect pedestrians A Shaukat, A Gilbert, D Windridge, R Bowden Proceedings of the 21st International Conference on Pattern Recognition … 3 2012
A picture is worth a thousand tags: automatic web based image tag expansion A Gilbert, R Bowden Asian Conference on Computer Vision, 447-460 10 2012
Data fusion in ubiquitous networked robot systems for urban services L Merino, A Gilbert, J Capitán, R Bowden, J Illingworth, A Ollero annals of telecommunications-annales des télécommunications 67, 355-375 15 2012
2011
igroup: Weakly supervised image and video grouping A Gilbert, R Bowden 2011 International Conference on Computer Vision, 2166-2173 10 2011
Push and Pull: Iterative grouping of media A Gilbert, R Bowden British Machine Vision Conference 2011 2 2011
There is more than one way to get out of a car: Automatic Mode Finding for Action Recognition in the Wild O Oshin, A Gilbert, R Bowden Iberian Conference on Pattern Recognition and Image Analysis, 41-48 2011
Visualisation and prediction of conversation interest through mined social signals D Okwechime, EJ Ong, A Gilbert, R Bowden 2011 IEEE International Conference on Automatic Face & Gesture Recognition … 6 2011
Capturing the relative distribution of features for action recognition O Oshin, A Gilbert, R Bowden 2011 IEEE International Conference on Automatic Face & Gesture Recognition … 31 2011
Social interactive human video synthesis D Okwechime, EJ Ong, A Gilbert, R Bowden Computer Vision–ACCV 2010: 10th Asian Conference on Computer Vision … 6 2011
2010
Action recognition using mined hierarchical compound features A Gilbert, J Illingworth, R Bowden IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (5), 883-897 231 2010
Decentralized sensor fusion for ubiquitous networking robotics in urban areas A Sanfeliu, J Andrade-Cetto, M Barbosa, R Bowden, J Capitán, … Sensors 10 (3), 2274-2314 60 2010
Learning to recognise spatio-temporal interest points OT Oshin, A Gilbert, J Illingworth, R Bowden Machine Learning for Human Motion Analysis: Theory and Practice, 14-30 1 2010
2009
Fast realistic multi-action recognition using mined dense spatio-temporal features A Gilbert, J Illingworth, R Bowden 2009 IEEE 12th international conference on computer vision, 925-931 226 2009
Accurate fusion of robot, camera and wireless sensors for surveillance applications A Gilbert, J Illingworth, R Bowden, J Capitan, L Merino 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV … 7 2009
Action recognition using randomised ferns O Oshin, A Gilbert, J Illingworth, R Bowden 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV … 26 2009
2008
Incremental, scalable tracking of objects inter camera A Gilbert, R Bowden Computer Vision and Image Understanding 111 (1), 43-58 54 2008
Scalable and adaptable tracking of humans in multiple camera systems A Gilbert PQDT-UK & Ireland 3 2008
Scale invariant action recognition using compound features mined from dense spatio-temporal corners A Gilbert, J Illingworth, R Bowden Computer Vision–ECCV 2008: 10th European Conference on Computer Vision … 136 2008
Tracking objects across uncalibrated arbitrary topology camera networks R Bowden, A Gilbert, P KaewTraKulPong Intelligent Distributed Video Surveillance Systems, 157-182 6 2006
Poster Session II-Tracking and Motion-Tracking Objects Across Cameras by Incrementally Learning Inter-camera Colour Calibration and Patterns of Activity A Gilbert, R Bowden Lecture Notes in Computer Science 3952, 125-136 2006
Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity A Gilbert, R Bowden Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz …