Thank you. Okay, welcome to this next paper session here at Expanded. It's our last session for today and it's about generative AI. My name is Katrin Brubst and I'm happy to guide you through this session. I'm giving my best. In the beginning, I have two news for you. One good news, one bad news. I'm starting with the bad news. Unfortunately, one of our artists couldn't make it. He had to cancel due to illness. So yeah, let's send our best wishes to Michael Wallinger towards Vienna. Unfortunately, he was really sad to not be part of that. This is the sad news. The good news, if you can say so. On the other hand, for the remaining artists, we can have a bit more relaxed schedule. So we have four artist papers ahead of us, four very exciting artist papers, and just a bit of the administrative. Well, for each paper, we have about 10 to 15 minutes, followed by a Q&A session, and you as an audience, you're more than welcome to ask your questions in the end and yeah without further ado i will hand over to our first presenter which is scotty chi chie wang Hi, everybody. I'm Scottie Hwang and I'm very happy to be here to share my artwork and the topic is biomimicry aesthetics and this is about my interest here about I usually thinking about how to use artificial way to create a kind of nature did yeah not only for the outlook about the ship and I also thinking about how to make it like a live behavior can do. And this is one of the plant is I'm interested to interact with. I think when I touch with the mimosa, the name of the plant, and it's very easy and relaxed and I can waiting for some reaction after I touch and the process is more smooth yeah so I'm thinking about how to use like a algorithm to coding a kind of shape like look like a growing process to result the the form and not only graphics, but the graphics can be interact with people and it's not only an animation people can feel free to touch the brain and different brain that is formed by the fractal trees data structure the structure there each branch is like a cell it can be a communicate to the neighbor cell doing like a be communicated to the neighbor cell doing like a cellular automata this kind of grid computation so I think that will be interesting to explore more yeah and in this area I think many researcher and artists already doing some things like a fractal system or cellular atom top, or using a rule to build a kind of group behavior to see the bird flying, yeah, everything. And I think that's also inspired me, yeah. So this work, I was thinking about to use fractal tree structure, and then also making some geometry design to develop a kind of recursion rules and use this way to growing up a kind of virtual creature and then I use grid computational set up some mechanism between the branch. So I deal each branch as like a cell. It can be communicated to each neighbor branch into a kind of local communication and doing a kind of emergent behavior to make people can be interact. And after that, I'm thinking about how to use visualization to make this data can be more visual perform with the people while interaction more aesthetics so first part is about the system of build up this shape. So I use recursion, this kind of data structure, and then I build up a kind of description to make each element can be generate itself again, again, again, and then process is a little bit changed to make the form a cycle growth-like. And a little bit different is between the module generate itself, it also build a kind of like a motor cell, this kind of rotational mechanism inside so use this motor cell this mechanism I can do the folding behavior later. This work is my earlier installation work in this work I was thinking about how to use a truss structure to do a kind of folding shape changing. So I used this kind of recursion and the growth and then adding a kind of symmetry to build up a flower-like form. So you can see they have the branch and the fluff. So to use this element to generate yeah so when the one in left side is the growth growing branch to put in symmetry and each branch can be symmetry to left side and the right side so it's look like a flower like form and I think the very big problem is about in this system each time it generate like a 100,000 line so the line will be overlapping together so it is total totally mess and so I'm thinking about how to not create data but how to find a way to make the data can present some like a visual present and the people can feel about the detail of aesthetics. So I build a kind of rules that rules can be changed the light thickness and the transparency according to the iterated layer. So use this logic then the outcome will be better. So I have some of my rules to translate data, to make the result in each iteration and the number of the branch, and it will be adjust itself to like how to present the stroke and the transparency. Thank you. So, we can see in this step, the flower is very similar to the phone-like. So the next step is I'm thinking about how to make, as I play with the mimosa, the leaves will be like a spread movement, like a ripple to outside and then how to make a kind of adaptation in this kind of structural like a fractal tree data structure. So something inspired me. When I see this kind of plant, it's a sensitive plant and when I look at the cellular automata mechanism, each cell is simply to communicate to each other, the neighbor cell, and then following the simple rule. And when every whole cell do the same simple rules, then it can be a kind of adaptation and then doing a kind of emergent behavior. But in CS structure, many example is more about how to present the data from death or life. So it's only like a zero and a one. So I am thinking about how to use data, not only the data. It's like a signal to trigger the motor to doing a kind of shrinking behavior. When this mechanic works, we can do something like triggering the shape changing and also the shape changing can be doing multiple possibilities. So in this picture we can see when each cell is open and the result is this and when each cell place one is like a life then it will be closed. So I can use this kind of rules to see data, like many data, they working together, and then each one trigger to a neighbor branch, and then branch not just a status, change to the zero or one, and then it will be translate the number sequence numbers guide the rotation to like a shape changing in a sorting way and then I I try to use this. Let me check my... So I'm trying to... Okay, so there are many possibilities between a flower close and a flower open. And then there are 32,000 branches, like a branch cell can do a kind of rotation, shrinking behavior, and it can be triggered to each other in neighboring and then it will be very elegant to interaction. So in this movie we can imagine the green spot as like a fly or something someone's touch, can like a random to touch the pixel and the pixel is not only a graphics it's a really like really virtual creature you can touch it and then real-time to sense the shape changing the behavior it moves behavior, thank you. And I think there is something appears that I never thought when I build the data. And then it's like the virtual creature is like dancing with me. So like some... So, like, some, some, like, yeah, so, so I also see, I'm sure, see some details when I interact with the graphics. And it appears some interesting patterns. So, I have new ideas. I think, why if the pixel, not only the pixel, exists in the monitor, and if I can use some like a robotic to like a pen or to make this like line with the stroke with the ink and render on the paper maybe have some possibilities that I can go with. So this step is just the beginning. But I think maybe I can do an interesting design process. I create a virtual creation and I play with it. It's like a medium. And then this medium, the result of the medium, I use chromatic to make it physicalization. And maybe it's a new way for recreation, something from virtual to the physical. OK, so and the other future worries, I hope this project can be projected in a more large scale projection environment. It's like 8K, yeah, and then people can handle the small touch monitor and then the image will be present more details and then that detail can more sensible from the human and audience to get more information and enjoy more details. And another is like, I refer the CA, several automata mechanism, but I only use a very simple rule in this project. So I think maybe there can be more possibilities from the CA rules to doing some mutation between the branch and not only rotate the branch or angle but can do something more yeah thanks hey thank you very much so do we have any questions from the audience yeah here in the back we have questions or one question Scotty fascinating work thank you I really loved your expression when you said oh I got so excited when I saw this and and did you really feel as though you were giving this entity life some sentience? It was becoming a living entity. Is that how you felt at that moment? Or was it the element of surprise it did something that you didn't expect? Actually, I tried a lot of errors. So I'm more curious about the audience interaction. And then I asked him, how do you feel it? But yes, I feel it like dancing with me, with my finger. So it's more like alive. I a little bit forget this is artificial. Yeah. Ah, okay. Yeah, I really love the idea of this coming to life. Do we have other questions from the audience? Yeah, one more. Also, that's fine. I'm just gonna hug the microphone for a while. No, I was also interested in this transition from the three-dimensional robotic earlier work you had done to strictly screen-based and now sort of modified plotter-based, I would say, you know, the robotic arm doing the drawing. Could you talk a little bit more about why you're making the decision to go from this entity that exists in three dimensions that I can walk around and interact with in maybe a more natural way than then touching a screen all the way back to something that's a little bit less tangible actually my background is from architecture yeah so the very beginning idea I use computer is as like a media, like a computer-aided design. But I'm a more romantic people, so when I focus on something, I will forgot something before. So I just keep changing. But I think still I love something kinetic, interact and life. So I still around around this area. Great. Yeah, so we have one more question. Thanks for your talk. I was fascinated by the little movie of the fly getting eaten. And I wondered if there's some way to incorporate the fly. Or are we the fly? Like, just, I mean, that's sort of a joke. But it's a serious question about thinking about the progression of the experience and this kind of dissolution that's happening. Is that something that you're thinking about also? Because the like... So it got eaten in the biological system. Eating a fly, yeah. Yeah, yeah, yeah, yeah. I mean, but is the fly... I'm thinking about the sort of user experience of interacting with a piece. And is there a sense in which, because you're inspired by these biological systems, that you put the user in the position of being consumed in some way I hope that people more relaxed to play with my word actually no after to my sensitive and he need to wait a while to wear the flower open again. Okay. Okay, thank you very much for this interesting question. Yeah, being hugged, I mean, it's a really excellent experience. Thank you. Yeah. And with that, we move on to our next speaker, who is Verena Repard. Hello, thank you so much for having me, thank you for inviting me. Congratulations, Kody, on your astonishing work. I would really love to interact with it. I'm super curious, I would like to talk to you a little more in depth later on. Congratulations, really, really, really astonishing work. So I'm Verena. I also contributed an art paper to this year's edition and it's titled Exploring the Transformative Potential of Evocative Interactions with Diffusion Models. I would say that I would share today much more of my personal experience interacting with diffusion models in the process of filmmaking and I will be talking about my short film Echoes of Grief which I produced last year and was my graduation movie at the University for Applied Arts in Vienna. I studied graphic design and I'm really fond of experimental animation and filmmaking. And I will today share my experiences of how the integration of generative AI was applicable to my personal artistic expression and how it augmented this as well. So I used stable diffusion. Stable diffusion is a latent texture image, a diffusion model, so it works like this that it is trained by adding and subtracting noise from an image and it learns different patterns and based on that it generates novel imagery based on the data set it has learned and it's based on a convolutional neural network it has learned and it's based on a convolutional neural network and I used this tool or different different tools actually different applications in three different levels first as a creative material to sketch it and to sketch my own ideas and to augment my own sketching skills in a way secondly as a filter to stylize my own visual material which was like 95% of the whole film I used the second technique but last but not least as well I used it in a co-creative manner in a more like human machine interactive way where really 40 seconds of my film are purely AI generated and I will talk about this later in a second. So this is a short excerpt of my film. This is my favorite scene actually and the film talks about the experience of a fatal loss. It's a very universal but still very personal story. And I tried to convey the feeling of losing a very dear person in this movie. Exactly. I think this slide should have been previously, but it doesn't matter. I told you already what I've used. So now we'll talk a little bit about the sketching part. So obviously I started with my physical body and I used, I created pencil sketches to set the scene, to set the tonality of the film, to build the worlds and then I started to sketch a little further with blender and a stable diffusion plug-in and on the right side you see the final scene like the very last outcome and I worked on this on the scope of two years so it like the evolution is quite enormous. Exactly, so those are three different examples. And I was quite surprised that I felt that this prompting of my own visual images and the visual, like the imagery I got back from Generator was augmenting my own visual horizons, I think I can say that. It really helped me to even like to streamline my sketching process and I'm very thankful for that and this was quite a surprising element for me. Now a little bit about a filter technique I used. I worked mostly in this environment. I used a technique that's called image-to-image synthesis and it's a browser-based Python-based application you can download, you can fork the repository and you can really install it on your own PC and it works in as a Chrome app in the web browser actually. So just a little bit about the workflow I created the worlds and animated them in Unreal Engine and Blender most, like all of it actually. Then I exported PNG sequences. I put it through stable diffusion. I got out styled PNG sequences, and then I composited, like I interpolated the two layers together. This is one part of the first scene actually of the film, Unreal Engine. For those who don't know, maybe Unreal Engine is a game engine. And this is the actual thing. Okay, it's lagging a little bit, I'm very sorry. But on the left side you see the Unreal Engine sequence in the middle it's the stylized way and on the right side is like the interpolated way so I try to inter inter weave the two of them because on the left side the problem was I was talking about such a personal topic such an organic topic as grief and losing someone, and I thought that for my cause this very distinct CGI look was kind of displaced in a way. So I was quite happy that I found the way to make it more organic and more like, to give it much more texture and make it more analog. This is also the blender scene, its last scene, I guess, of the film. And you see it's just very slight, it's a very small percentage of diffusion that has been applied to the imagery, but still it has a huge effect. And also this changing and morphing is inherent to these technologies and this is like actually most of the things I was doing you can put in your own image you put in your prompts and then you say in the way like how much of diffusion will be applied to your images like if you say okay give me 80% of the diffusion then most of your imagery will be distorted and in my case I used 2% so I really just I worked with a model that's called analog diffusion which was trained on images like on a whole data set of analog photography and I think you have here, you have a very good comparison. Like the stylized animation and the original animations. It's very slight, but still I think it has a huge impact. And here as well, this is the original and the stylized. And again, the original very sharp very hard and the much more organic outputs exactly and last but not least the third layer I used stable diffusion my workflow it was this co-creative process I call it and it's like really when you're animate with words this is also in the in the stable fusion web user face environment and I will not go into depth about all of the parameters because I could fill three hours with it but the key thing I think here is on the right side you see the blue like the blue circled square and you have different numbers and the numbers mark the frames and you say like, okay, this is a default prompt I did not write this, this comes with the application. But it's like on frame zero, create a tiny cube swamp bunny and from frame 30 on create an anthropomorphic clean cat and so on and so forth. So actually instead of really animating with your hands or with like a 2D software you animate your keyframes or words. This is the final scene I came up with from my film and it was very interesting, this whole process of renegotiating my own artistic agency. Because I really had the feeling that if all of this is generated, then where is my own artistic sovereignty? generated then where is my own artistic sovereignty and this was like the whole I think these 40 seconds were most of the work and I spent most of the time really figuring out and regenerating and trial and error as you also said it's a lot of trial and error like making mistakes and working with the mistakes kind of so I always had a question like is this what about it this is mine and do I own this do I have like ownership and then I came came up with this filtering technique and I regained my agency again so I was quite happy about that but what was very surprising for me was still this interaction right because I was spending so much time alone in my office in front of my PC and prompting and formulating and like putting in words what I have never ever before articulated in such a distinctive way and I think this was very in a way maybe it sounds cheesy but it was kind of healing because it felt like I was journaling like because if you talk to a friend or a family member or also a therapist you never like really go so deep into these very distinct childhood memories. And what surprised me the most was that it was kind of evoking long-forgotten memories, because if you talk to a friend or a therapist or whoever, you do not get this immediate visual feedback from what you're putting in. And this was really surprising because suddenly I started to remember moments I have already forgotten, which were like 20, 25 years ago. So this was really surprisingly therapeutic. I would still advise, please see a therapist if you need it. But yeah and also I was quite astonished to find these very weird relationship between the process of bereavement, the process of letting go and losing somebody and just having to lean back and watch life unfold in a way. And I felt that when I was sitting in my dark studio in front of the PC watching the generations. And I was like, no, this is not what I wanted, stop that. And I always had to go back and go back and try again and try again. And at a certain point I was just, I had to give up and take what was presented to me and work with that. So I think I was not really creating, I was more curating happy little accidents in that way. And I think this was still a very refreshing and very, very, it was a very novel experience for me and I'm very happy that I was able to have it with myself and with this machine instance, however you may call it. Thank you very much. Thanks for the opportunity. Congratulations, everybody. Thanks for the opportunity. Congratulations, everybody. Thank you so much. Okay, thank you for this very emotional talk. Obviously, yeah. So, we have a question already. Yes, you can see it. Thank you for reminding me. Tomorrow at 10 at the JKU Med Space and on Sunday also at 10, I think, right? JKU Med Space. I think... JKU Med Space, yeah, you can check it out, like all those locations in the map. I guess you can find it easily. So what is it tomorrow and... Tomorrow at 10. map, I guess you can find it easily. So what is it? Tomorrow at 10? JKU. The university. I'm sorry. JKU. Yeah. Okay. Yeah. So everybody, be sure to check it out. You're more than welcome. Do we have more questions? Yeah. Over there. Let's start here in the back. The microphone is already there. Hi. Thank you for showing us this. It's really, really cool work. And there were a couple of things that you said that I find really interesting, especially about like, which is something like, I hope you're going to be here tomorrow for our talk, because we work with similar things, but more with liveness. And I know some of these experiences really well. And I really find it interesting that you talk about letting go of the need for ownership a little bit and how that feels and how it becomes like a co-creation. And I was wondering if this is something that you're going to integrate into your artistic practice more, and if it changed something about your feeling of ownership and the feeling of like how important it is in your work? I think this is a big thing that you're addressing right now, because in the beginning, I started the film when I was like two years ago. It was exactly when this whole AI hype got into the mainstream. And I was naive enough to think that I could create a 20-minute narrative solely with generated imagery and I thought oh yeah this will work it's the magic box I just put in my prompts and it generates the whole film but actually it was not in any way like that and I really like most of the part and most of the struggle I had was finding a way to integrate this very nice tool that it is into my own artistic practice. But also, I think it's a thing with ego. In a way, you have to just throw it out and be OK with that you're more curating than creating. But I think it was also very relaxing in a way, because I was not... I didn't feel responsible for what happened. I just felt responsible for choosing it or not choosing it to implement it in a film. But other than that, it was like looking at somebody create something for you and you're then the art director and say, yes, I take that or no, that's bullshit. I do understand what you're saying. And it's like, it's sometimes this back and forth where you like, sometimes you're like trying to regain control that you like, you like, you try like several ways of like controlling it and then it does something for you that is unexpected and then you're happy that it did it. And it's like a collaborative process. Do you think that? Exactly, exactly. I was, it presented ideas and images I would have never been able to come up with myself. And I think this was a very fruitful, collaborative process, as you say. You you're surprised by something you wouldn't have thought about yourself, and then you take it and from that on you yourself work with it again. So I think this is a very nice intertwined way of having an agent with you. And also like filmmaking is very... It's a very... You feel a lot of solitude in a way, you're very lonely, especially animating 20 minutes of film. It's a lot of work. And I was so happy because the first time I felt like that I was not only with myself, but I had a buddy in a way. I don't know if this is romanticized, but I felt, it felt very interesting, yeah. I totally get that. I was testing out the new control net yesterday and I was like, ah, I totally get that. I was testing out the new control net yesterday and I was like, finally you get me. But it's so funny because I did this in the stone age of generative AI two years ago. And then it took me so much work of coming up with this filter workflow and how do I do it, how do I get control? And then control net came. And I was like, come on. I finished my film and now you're putting out this? But it's like super crazy how fast, like the pace of these technologies. Every week. Exactly. You really have to study hard to keep up. I'd love to meet you later. Yeah, let's go. I'd love to meet you later yeah let's go I'm here thank you very much okay thank you we had a question over there yeah you know thank you so much for this great presentation I enjoyed listening I think it's super related but maybe adds a different layer particular so it's just an assumption when you when you want to tell a personal story you maybe have some preconceived notion about the emotions that you're trying to evoke. So how hard was it to create the specific emotion or at least visually the specific image by using text because these are just prompts and then you get a random result say, random result. But I have no idea, but when I imagine, I find like, oh no, this is not what I want. And then again, a different text. So how hard was it to really get the result that you wanted to have? So I think the scene I have shown was the first I was really okay with. And previously I have had, had like I don't know how many how many sequences I think it was like 50 to 60 70 I stopped counting at a certain point because you always have exactly this this is not what I wanted or you have like the first 50 frames are okay nice looks good and suddenly it just starts to deviate and it's something completely different and it was just a lot of work a lot of trial and error a lot of stitching together different sequences regenerating and at a certain point also just giving up. Thanks. Thank you. Thank you. I was about to ask the very same question. Thanks for the answer. OK. Yeah we have one more. Jürgen. Thank you for your presentation. I do not have a question but I want to add that there is a screening tomorrow at 10 o'clock and on Sunday 10 45. It's a very complicated schedule at Ars Electronica and I give you a hint if you go at the expanded animation website there is an orange button on the left and then there is a direct link to the screening program at the JKU Medspace. That was my question. All right, thank you for the advice. So JKU MedSpace on the expanded website. You should hopefully be able to find it. Otherwise, you can talk to Jürgen later on. He will hopefully help you out. Okay, yeah, so thanks. And with that, we move on to our next artist. Thank you once again. Thank you. And our next artist is Eugene Yang. I'm afraid we need a bit more time for arranging the presentation, so should we swap? So if it's fine, we can also swap spots and have you as the last presenter. Then we would move on with our next team. I hope everybody's here. So this will be a more complicated one for me. Sorry for that. I'm already apologizing for not pronouncing your names correctly. It's Wee-Jung-Soo, Yu-Fang-Leh, and Zhang-Yu-Ye. I hope it's somewhat close. Thank you.请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?请问您的意思是什么?来与大家分享想法和艺术感谢每个艺术家我们是台湾的三位艺术家这本书是我们的表演艺术进行的经济研究我们的题目是《灵性宗教的祈祷》很长的题目这十分钟我们会更加集中在我们所受到的创意 Ritual. It's a long topic. And in these 10 minutes we will focus more on what inspired us, then we will discuss how we make this true and how it challenged the media we're discussing right now, virtual reality. And here's our team members and performance members, Yeh-Chen Yun, Lai Yu-Fang and Ming Weichen. So first, we can talk about the background, the ritual we mentioned in our topic, Guan Luo Yin. So Guan Luo Yin is a terrorist traditional ritual. In the rituals, as general rituals, such as shaman or mediumship, maybe you guys have seen, heard it before, and there's a priest guiding, like heard it before, and there's a priest guiding, like the picture over here, there's a priest guiding you, and the purpose of the ritual is to travel around the world, visit people you're missing. The critical step is participants need to blindfold themselves, as the picture showing over here, and disconnected from the vision world watching something is not real or unreal or it's not in a reality and it's really similar with virtual reality so we can like somehow find in relationship or connections between virtual reality and a ritual. So based on this context we try to evaluate these three problems by their characteristic of the medias. So the first one we aim to retrieve the characteristic of ritual in virtual reality and by this retrieving to develop a new form of virtual reality performance or artwork and with this artwork providing audience and performers with a co-creative narrative experience. So now we can quickly mention about three characters in the ritual. So first we have a priest, someone's guiding the ritual and make the ritual happen. And the second, we have participants, the persons that experience the ritual. And he or she also is the creator of his or her own experience. And third, we have the mysterious power, maybe a god or a spirit. So, how to retrieve this relationship of virtual reality? We will introduce the new form of narrative that utilizes in our artwork and this project. We call it as triangular co-creating narrative. Okay. Okay, this triangular co-creating narrative draws on the interactive characteristic and narrative method of Guanluo-Yin rituals. And this approach brings together three key elements like audience, artificial intelligence, and performance. And as part of our immersive performance, the performers act as mediators, playing a crucial role in bridging the physical and virtual worlds. Like the left side of this photo is performer A and he will interact directly with the audience through conversation to create a safe environment and guide them into narrative. And on the lower right of the pictures, behind the computer is performer B. And she will translate these personalized narratives into prompts for our AI system. And the third performer C will control the soundscape, indicating the stages of the research. As for our artificial intelligence, its base database and computational power play a significant role in expanding visual scripts, possibilities and creating experience and narratives that are almost impossible to replicate. Here is a quick overview of our technical architecture behind our immersive performance. We have built our system using TouchDesigner software and Oculus Quest 2 hardware, which enable us to integrate the various system and process image in real time. To handle image generation, we use a module called Raging, which can produce image three times faster than standard method in stable diffusion, creating coherent real-time frame-by-frame animations that the and current virtual reality. So first, for state, we're trying to bring the between pre-sense journeys into virtual reality. It's like participant not only be immersive in the virtual environment or virtual reality, but also interact with someone in real life or some real people. And the second, the transcendence of consciousness, we invited participants to immerse in their own narrative, to make the experience close to the participants' own memory or themselves. And the third, we are trying to, the third state, we are trying to challenge the unreplicable image experience. We trying to break so-called fixed script or pre-write script, creating an experience for real-time personal experience. Okay, and this project has been performed in three versions of settings. And in stage one, we perform it in a box theater. And, oh, here. This is a part of our demo video of this version. Like this, there is a piece that we're talking to the participants. And there will be the sound that controlling in real time, and also the AI-generated real-time-generated images. And this is the version. This is a visual effect in this version. And in stage two, we invited students from different majors with varying levels of familiarity with VR and with diverse cultural backgrounds. We eliminated negative experience by modifying our AI models and keywords, and also translated and revised the script to successfully involving the audience's own life experience. For instance, the Russian audience in the picture, she has noticed images related to her hometown and we transported her back to a childhood she rarely recall. And in stage 3, we exhibited in Labo Virtual, where our audience consists of industry professionals from diverse cultural backgrounds. And as in this video, we simplify our equipment to perform in the small stall. And the guide of the priest is transported through microphone to headphone. And another moment is a viewer shed tears during her experience, reflecting on seeing a doll and recalling a departed friend. She described the experience as healing and peaceful, enabling her to confront her sorrow comfortably. So this system has been experienced by about 60 audience from various nationalities. Most of them describe it as a calming and immersive experience. Some mention it as a meditation-like trip. And as the example in stage three, it shows a sense of soothing and comforting effect, giving a healing and peaceful journey. These three faces have proven the progress of emotional effect of this system. Compared to the previous VR project. The new form guides personalized storytelling and spiritual resonance more effectively, providing audience with unreplacable experience. Replying to the three questions we stated in the beginning, this triangular co-created narrative structure, which was co-created by AI, audience interaction and performance improvisation, retrieves the essence of Guanluo Ying and the other similar rituals. This structure goes beyond the limitation of face script and provides a new narrative for virtual reality. We can state that it can adapt to inner exploration content regardless of cultural differences. Furthermore, we see the potential for this system to adapt to different improvisational scenarios and look forward to its application in various future works. In the next phase of development, we plan to gather additional feedback by participating in various exhibitions. Additionally, we hope to collaborate with experts in artificial intelligence and theatre art fields to evaluate and refine our system. And this is the end of our presentation. Thank you so much. Thank you very much for this presentation. So, do we have questions in the audience? Here on the right, we have one question. And hopefully also a mic. Oh yeah, it's coming. And hopefully also a mic. Oh yeah, it's coming. Thanks for the presentation. When talking about AI, we frequently, and this is what I also recognize. It's on? You can hear me? Oh, great. This is also what I recognize here as a pattern at a conference. We frequently talk about control, right? Control of the stimuli that we create. And I was wondering particularly in VR, is it even more important to have control when manipulating or showing visual renderings because, I don't know, due to effects of motion sickness or anything else that could negatively influence me. So what do you think, or could you elaborate a little bit about how you try to control what the participants experienced in VR visually? Okay. So the images that generated will be, the keywords will be the conversation with the priest so we are trying to a sense of try to visualize what she or he or she have imagine about. But also, uncontrollable AI, the part of that we cannot control is also the aesthetic of this piece because there will be a sense that AI wants to show the participants to see something. So we think that is also a very interesting part of the work. We have another question in the very front. So Thank you.. I think my question, I grabbed his microphone because my question is maybe connected to his. I still didn't get what you exactly see in the VR in the sense of, and this was his question, is it very slow images? Is it 3D stereoscopic images? Is it rather a plane that you see, or a 360 image, or video, or rendering? So basically we have two versions, and I'm not sure if you remember, there's a video showing tiny circles, and the circles are showing some picture. That's the first versions, and the circles are showing some picture. That's the first versions and the image basically is the like the real-time generative Animations like frame by frame animation actually so it's like the first friend and the second friend is similar but not really similar is that good animations friend to friend animation actually and not really similar, it's like a animation, frame to frame animation actually. And we have the second versions, and in the second versions, there's a 360 environment, and the image, you can turn around and see there's some effect behind, around you. Yeah, so there's two different versions. Thank you. Thanks for this question. Good that you're so close so you can pass the mic around. That's fine. Thank you. I think it also connects to the first question you had. And maybe it was already answered. I don't quite know, but I will ask. Anyways, you said that you try to eliminate negative or harmful imagery That the participants do not have a bad experience and I'm very curious how How do you what do you define by harmful? experience in this way and how do you eliminate it or how do you try not to show images that maybe harm a person because yeah for this we change it a lot of modules too because some of them that really generate picture that is kind of we can say is scary so we changed a lot of them and also because that the prompt that typing is by the performance so she will also translate it to the positive keywords to not provide the bad experience. Maybe sometimes the CH, I can control the CH scale and still also many parameters in stable diffusion and it will dynamically change the visual in this VR headset. So I can quickly monitor and change. That's a good question because we're trying to touch their experience or memories, but some memories may be horrible or terrible or they don't want to remind it. But it's like kind of balance between to show, touch their experience or protect them. And we're still trying to figure it out right now, like this big balance. Because it's also interesting because, for example, maybe for me, something which is not considered horrible might be affecting me negatively which you are not affected negatively by and I think this is always you have to run the risk right if you put people in such a vulnerable place it always might be that they get emotionally touched or moved but actually it's also maybe if you don't show them very harmful images, maybe it's good that they are moved in a way or that they are emotionally touched. So congratulations on your project. Thank you. Yeah, thank you very much for this very dynamic narrative. So it sounds really exciting. Thank you once again. And now let's give it a try once again with Yu Chen. The stage is yours. The stage is yours. Does this work? Or this one? Both? Okay. My apologies first. There were some hiccups in terms of my slides. And also I didn't prepare a lot of visuals, so this would be kind of a wordy one. And hi, I'm Yu-Chen. I'm from EPFL in Switzerland, in Lausanne, and I'll be presenting the work, the transaction reconfigured through Swiss National Historical Archive. So context. Audiovisual archives are getting extremely huge now, and the access is becoming a problem. I'm not talking about the search or, for example, curated screening sessions on a festival, but more on the level of explorative and casual browsing level of access. It's because it's becoming too huge that these type of access is normally impossible. So the public could not really access the archive as they should be, freely, in a way. And the result is that there are perhaps 1% of the content that is the most important or the most interesting to people that's been constantly accessed, and then the rest is just on the shelf forever. One solution to this type of problem is to frame the archival content out of its historical or accurate context in a way, and using aspects like emotions, colors, shot levels, any other filters or text to access them. These methods, some call them generous interfaces or levels, any other filters or text to access them. These methods, some call them generous interfaces or explorative interfaces, encourage the consumption of archival content as a more primal and iterative linguistic practice, seeing the content as matter happenings or narrative fragments that does not necessarily have a defined meaning yet, so that the access become less filtered and more democratic in a way. The rearrangement, regrouping, repurposing and recontextualizing of these fragments is then the key for such access model. And it's because these type of new arrangements and the new emergent narrative and the comparison with the original historical meanings that makes this whole practice a multi-layered and deterministic narrative experience in a way. Examples of these approaches include, for example, different experimental film making or interactive documentaries with different historical footages repurposed to craft new fictional narrative or interfaces that allow users to explore or combine different fragments using different filters like colors and movements and et cetera. This shift in focus not only allows a aesthetic approach to understand the content, but also our history, and it sort of brings life to the less accessed or the long tail content in the archive. However, these type of methods have problems. For example, they're always labor intensive or requires pre-processing for a predefined text or different features, sort of limiting its scalability and flexibility, and in a sense it's defeating its own purpose of opening up and have a freeform explorations of the archive for the public. So can we find a open and freeform interpret operator for such access? The popularization of the encoder-decoder provides a great opportunity. This opens up the translation between modalities using, in a way, the representation or vector interlayer. With the encoding phase, it sort of allows the semantics or what we call the gist of the human materials to be represented with vectors or in a Latin space. And with the decoding phase, it is sort of in a way allows the translation from the vectors back into some modalities or materials that we're familiar with. Using this type of representational mindset, many works focus primarily on video image generation like diffusion models and stuff, and creating content from scratch using the semantics that we get from the text or whatever modality that we use to represent. But in the context of audiovisual archives in a way, we already have so many contents, so why do we have to create new content in a way? Why can't we just generate from the archive and use some sort of interlayer operator to recombine audiovisual contents from the archive? So here I propose this work, the trisection where you basically have your personal story turns into a manga strip that features Swiss National TV archival content. In this work audiences enter any narrative text from the keyboard provided and the system will start to match the archival video clips using the input text with our joint embedding space and the key frames from each clips retrieved will be then restyled into a manga image and then using the thermal printer to print out a receipt for the user to keep as a present or a souvenir and the QR to keep as a present or a souvenir and the QR code at the bottom is for linking the historical content to like the new story that you've just built. The overall workflow is shown here this will be a little bit more technical I guess. The core of this matching or translating from the natural language input to archival video clips is in this narrative aware language video joint embedding space that we built. We start with a standard video text dataset that has many pairs of video clips and corresponding and very simple captions to it. We then work on top of this data set with a language model-based pipeline to infuse the captions with more narrative elements such as emotions, speech, colors, overall ambiance and et cetera so that we improve the captions. And then they're both sent through the transformer based encoder BERT to become the text representation that we use for future training. And the video side we used a collaborative, like a collective of expert encoders for different modalities in the video, for example the visual transformation, transformer for visual information, the audio spectrogram transformer for audio information, and then we fused these different modality representations together into a unified video representation. Both representations from the text and the video side are then sent into the learning process, where the objective is essentially to pair the most, to pair the already paired videos and text representations together, so that they're very close in the space, and then to make the videos and captions that are not paired together as far away as from each other in the space. By doing this training, the space is then able to capture and represent the semantics from both the video and the text side, so that it could be used as a space that we then translate between different modalities. When user type in a new input, it's split into multiple queries and then encoded through our model in the space to find the nearest neighbors. And also before that, we have processed all of the archival videos in the same manner so that the archival footages are also in that space. So now we have the closest match from the video's archive. We then use a package to extract the keyframes of each video that we retrieved. Once the keyframes are gathered, they're sent to a stable diffusion control net based style transfer pipeline to transfer into like a manga style for stylistic consistency. And the image editing pipeline in the Python is then used to add the original input prompts from the user as speaking bubbles and for some randomised type settings. So here's an example of all of the results. The specific physical stripe allows the transaction to sort of align with curator Hal Foster's vision of archival art, which makes historical information often lost or displayed, present again in one way or another. The pixelated quality of the printed image also works well with the fragmented thermal paper to sort of create a ephemeral record in a way that the image is doomed to fade and the paper will easily tear and mark. The receipt also serves as a great souvenir for this transcription of private memory and public historical records in a museum setting for the public or the visitors. With the ability to browse, retrieve and recombine archival video content through a natural language, we hope this work can serve as a teaser for more generous or explorative interfaces to allow a more free and generous access to archival content to provide more democratized and personalized and important more importantly entertaining access to the general public of the archive and this is part of a larger project called Synergia narratives from the long tails It is a collaboration between EPFL, University of Amsterdam, University of Zurich. And it works with four large archives in the world. One is this Swiss National Television Archive. One is the Olympics. The other one is the iFilm Museum Archive. And the fourth is the Montreux Jazz Festival Archive. So there's going to be a lot of different interfaces with this type of physical interface or web interface or more immersive 360 panoramic screen interface for all of these archives. So if you're interested, you could also hit me up afterwards and I can tell you a little bit more. And that's it. My mic works, yep. Okay, then last round of questions here in the back. We have one already or we start more in the front and keep this in mind. It's fine. Go, I'm the one? Go ahead. Okay, thank you for your project. I was very curious about the use of the diary as the point of entry. Can you talk a little bit more about how you came to that as a way to access this particular archive, and would there be different types of point of entry for different types of archives? Yeah, so we did interviews and surveys before we attacked each of these archives, and I am the one who's working with the National TV Archive, and my colleague is working with the Olympics. And for him, I think he used gestures and motions to explore the archive, which makes more sense for sports videos. And for me, I guess it kind of makes sense because the TV archive is in a way like a day-to-day kind of archive of the happenings in the past. And we asked the surveyor or like the people who sort of worked in the project, whether they would like to have this interface more dedicated to the collective memory in a way, like the historical events or the more important things, or do they want more of a casual thing to just access this archive through what they are doing every day? For example, I could just be eating a Popsicle, but then that could be part of a very historical event, for example, if someone important is doing that during a conference or whatever, or like a concert in a way. So the result is because we wanted to sort of democratize the access to archive, we don't want the more important things to be stressed even more through these type of interfaces. We want people to be able to access archival contents that they don't necessarily have access to in the past, or they don't even think of accessing in the past. And then using this type of a diary, in a way more like a meta happening, or like the sort of fragments that doesn't really mean anything in their life. And they can reconstruct everything through their own narrative but also comparing it to like the actual historical context of different clips so there's kind of a like multi-layered narrative or context is in a way that we're trying to investigate a little bit more, if that answers your question. Does, well, I'll take advantage of the microphone to ask another quick question about the manga and the form. Was that, why did you pick that? That's just purely my preference. Good enough, yeah, thank you. But yeah, this has different variations. So like, this is like a printer kind of a thing that we think would be interesting for museums to give the public as a souvenir. But then, as you can see, on the right, there was like a video. So it could be easily just a web interface to create new videos. And there's like different ways of diversifying the output in a way, according to different needs and like, context. Okay, so then I guess we had a question in the very back. Yeah? Hi. First of all, thank you very, very much. I think your approach to work with the National Archives is wild and really, really interesting. And one of the things that I find really interesting is and important is to ask, first, the question, what is deterministic? What is non-deterministic? The epistemology behind these kind of approaches. What I'm really interested in is when you take an archive, and part of my question is already answered, because what sort of things are you picking for people? And the other thing is when you start augmenting an archive and rewriting basically memories, and you give people interpretations of an archive, what do you think are the wider implications for that? Can you repeat the last sentence? What do you think are the wider implications for that? Do you think an archive has to be an accurate representation of something or do you feel like it has it can be augmented and rewritten? Before we started this project and it's a huge collaborations between different organizations and there's research institutions and also archival people as well involved. So we sort of define this whole work on not something working with the accuracy of the archive in a way. So like it's for entertaining public engagement kind of a thing for the archive. So like naturally the scope of this work doesn't really include to have like an accurate representation of the historical events or to have something to have like a one thing sort of archived holistically in the archive in a way so this is from my point of view is just like a very this folk is focusing on another perspective of the archive public value in a way that it sort of has to engage public, and it has to sort of allow the public to be able to understand the archive is not just about certain events, but like about the overall history of a society or a culture. So there's like things that we wanted to emphasize, and those things are not the the sort of the most and the accurate historical events or the most Popular things in the past if that makes sense Okay, thanks so we have more questions um I'm trying to formulate my question because I'm also very interesting and interested in that recontextualizing the work. So I wonder how important the material in the archive is to the actual outcome. Could it just be any image? Could you just be using some Google search and do the same thing? Or is there some connection between the material that's in that archive, that specific archive? And just to give you a little context, I am also an archivist, or I work with archives, but it's all of that material is related to the history of computer graphics. And so I'm trying to envision what you're doing, you know, would it work with mine? No, I don't think so. So how does it work with the actual content? What's the relationship? I think, to answer it very briefly, I think this could be applied to any content in a way. Like if you said you want to apply it to Google or any open source or proprietary stock video or image archives, then it could be a tool for people to make documentaries or wanting to do experimental filmmaking in a way. But I guess in the specific context for this project, we used the Swiss historical archive and then we are aiming the Swiss public in a way. The context here is this is old Swiss happenings and then there's like the new generation of Swiss or like people who's been an expat in Switzerland for ten years and who's never seen this kind of thing in the Swiss TV before. So I guess there is like some sort of prerequisites in terms of context for this to make sense so so that the Swiss people would then to think, oh, the days has changed, the things has changed, different things has been different now. But I guess it could be catered towards various source of context or source of materials just based on what you want to achieve at the end, because this one has a more or less specific objective in mind that we wanted to showcase. First off, the differences in, or the changes that happens throughout the history of Switzerland and also to allow the public who has not really been very fond of watching Swiss TV to have an access to the TV content in a way. I don't know if that answers your question. Yeah, can I ask an extension to that question? And that is, I'm sure that people that come into the museum or come in to access this, you probably get some older folks that are familiar with this material. What kind of feedback do you get from them? I would imagine that they would bring to it nostalgia, their past experiences with that television show, because all the material is from television shows, right? Okay, so and even the children may have heard about these shows from their parents or you know, seen them on YouTube or something. So have you gotten feedback from people that actually know these shows and see it recontextualized in this way? Not at a very large scale. This was showed at the school because the headquarter of the TV station is going to be at the campus. So we showed to the people working at RDS, the television station and their friends, and they think this is a very entertaining way for them to show their kids how things are done in the past, in a way. They wanted to say, we used to make phone calls with these type of thing. But we are doing a large-scale evaluation of all of the interfaces at the end of this year or early next year across different participatory cities and campuses. So we'll get more feedbacks by then. But that was something that we're also curious in. Thank you. We have one more question here in the front. I guess we still have time for that. I hope. Thank you very much for your presentation. I find it very interesting even though I think that it's more like a tool waiting to be applied in different contexts for different archives and my background is in digital art history and I'm also interested in that working with archive. I just finished project on Woody and Stajnava Šulka's digital archive. The content is images, videos, texts, like separated outcomes of the sources for this application. And I was thinking during your presentation maybe there can be also other applications in case of archives of visual art that are usually connected, they are not only images, like individual images, but there is so much historical writings, interpretations, philosophical historical context, that it might help to synthesize all the knowledge and to present these artworks, not decontextualized, but in the context. So do you think that it's possible to use it this way? For example, there can be also music of the time connected, whatever, just to synthesize it meaningfully. I mean, yes and no, I guess. Yeah, I think it's not such an easy idea, I can imagine. In terms of this joint embedding, the big companies are now including every modalities in it, like Facebook has image bind, which basically co-represents every modalities you could think of in a way like thermal things, depth image, CT scans or whatever, and music, video, audio, all that kind of thing. Everything could be sort of in a way bind into one representations in a giant Latin space. So in that sense, maybe yes, you could find a space that works with heterogeneous data modality that could also be accessed through, let's say, natural language, or you could even write a sound to match the sound or the thing that in the Latin space they think is the most similar to the song or the thing that in the Latin space they think is the most similar to the song but in a way no because if you're emphasizing the context or the accuracy of the materials and you're accessing through the accurate-ness of the material it's kind of it's it's then in a realm of how do you control these kind of representations for it to be completely accurate in a way. It can be wrong, it could be there's the hallucinations or like the gaps, the semantic gaps that's not avoidable. So it's no because you can't really just say these type of things will be applied for any scenarios especially when you want accuracy or deterministic in a way but I guess with certain measurements or certain additional layers of control it could sort of achieve that but yeah as you said this would be a tool for people to use and then I guess based on different scenarios, the tools has to be adapted a little bit as well. I understand it's in some way like limitations of the potential of the AI tool. And on the other hand, I think that you got usually the digitalized archive that is already full of metadata done by humans, the librarians, archivists. So there can be connection, but it's not usually connected, for example, archive of images with archive of the text, like books, literature. And this way, it can be maybe some, there can be some potential, some new knowledge that it can bring. We can talk about this offline, I guess. This is getting late. I guess we can talk right after our session. Because we're already a bit over time, but I hope it's fine. That concludes our session. Let's thank all the speakers again, all the artists,, all the artists, Yujen and all the other artists. Thank you so much. It was really excellent art papers that we saw here. And yeah, well, that concludes our session. It's Friday evening, weather is nice, so I guess you should enjoy all the entertainment that the City of Linz and that Ars Electronica Festival has to offer. And yeah yeah see you around thank you tomorrow I guess we continue not 11 okay tomorrow. You should be here again, it seems. Okay, tomorrow 11. We'll see each other again then. Thank you.