WEBVTT

00:00:18.000 --> 00:00:28.000
Okay, welcome everyone. I can see the participant, the participant number ticking up as people join this.

00:00:28.000 --> 00:00:47.000
Demos, experiments and Q&A session today. Thank you for joining. And hopefully you will find lots of relevance in interest and things to ask questions about over the over the course of the session.

00:00:47.000 --> 00:00:51.000
He said I'm just gonna give a few more minutes. In case anyone else is trying to join, find the link, all of those kind of things.

00:00:51.000 --> 00:01:09.000
And then we'll get started.

00:01:09.000 --> 00:01:16.000
Okay, so let's let's go to the the next slide.

00:01:16.000 --> 00:01:26.000
So again, just, For some of you who've joined the earlier sessions today, you'll have heard our notes, but just to make you aware of Crossrefts code of conduct.

00:01:26.000 --> 00:01:37.000
And if you are on if you are on mastered on or X then you can join the discussion.

00:01:37.000 --> 00:01:52.000
We'd welcome you to do so on those platforms. And but I think probably the most effective thing for this session is if you've got questions, and use the Q&A box to pull those to our our presenters.

00:01:52.000 --> 00:01:55.000
We're going to take questions for each of the presenters after after each of their sections. So that they're still fresh in your mind.

00:01:55.000 --> 00:02:09.000
And we will share the the slides and recordings afterwards. Next slide.

00:02:09.000 --> 00:02:21.000
I'm the Reminder, you've got I think around 2 h left, so if you haven't voted for the 2,023 board slates.

00:02:21.000 --> 00:02:31.000
Then our ballots close pretty soon but if you wish to submit a vote or change your proxy vote then then let Lucy know in her emails. In this.

00:02:31.000 --> 00:02:48.000
So, we've got a really good, interesting slate of candidates this year, so I'm part of being a member of Crossraff is having the capacity to be able to vote and make sure that our board is as representative as possible of you in your communities and your needs.

00:02:48.000 --> 00:02:51.000
So we'd really encourage you to do that.

00:02:51.000 --> 00:02:55.000
Next slide.

00:02:55.000 --> 00:03:07.000
So I'm going to start by introducing the session. And as I said, this is product demos, experiments and questions for those who are willing to to present.

00:03:07.000 --> 00:03:25.000
Would say the aim of this session is to give a view across the things that we are at the early stages of exploring and working on both from the product and development teams and also you'll see strong representation from the from the labs team at Crossref as well.

00:03:25.000 --> 00:03:46.000
And we're going to give some hints and tips for using our Api's. Ones that you're probably very familiar with or would like to get more familiar with and newer ones as well and then external integrations with our with Crossref from our colleagues at the Public Knowledge Project.

00:03:46.000 --> 00:03:58.000
For whom it's very early so so thank you Eric. And each demo will last for around 1520 min and we should have time for questions for our panelists.

00:03:58.000 --> 00:04:00.000
So, and Luis and I are gonna keep an eye on the Q&A and try to make sure that those get to the right people.

00:04:00.000 --> 00:04:10.000
At the right time. So.

00:04:10.000 --> 00:04:13.000
Our first demo is is currently under development. So again, we've got for a couple of these things are under active development.

00:04:13.000 --> 00:04:29.000
We've tried to step away for them and not from them and not touch them for the for the purposes of the demos today, but as any of you have done a live demo, know they can be all sorts of fun.

00:04:29.000 --> 00:04:36.000
So Kudos to my my colleagues. But our first demo is of our currently under development registration form for, for journal content.

00:04:36.000 --> 00:04:50.000
Lena Stall will share a sneak peek of the work that we've been doing to create a simple user interface to make it easier for members to register good quality.

00:04:50.000 --> 00:04:56.000
Journal article metadata. So over to you, Lena.

00:04:56.000 --> 00:04:57.000
Thanks, Rachel. We're actually starting out quite safe. I'm not doing a live demo.

00:04:57.000 --> 00:05:09.000
I just have screenshots and slides so everyone can relax. I just have screenshots and slides so everyone can relax for the next 1015 min and then it gets really exciting when people start doing things like.

00:05:09.000 --> 00:05:15.000
Yeah, great to be here today for the many of you who don't know me yet.

00:05:15.000 --> 00:05:21.000
As Russia said, my name is Lena. I joined the product team at Crossraff over the summer.

00:05:21.000 --> 00:05:29.000
I'm based in Beautiful, in Germany and I'd like to tell you a little bit today about the work that we've been doing to develop as much as said the simple user interface or UI.

00:05:29.000 --> 00:05:56.000
To allow them to register gen, without having to touch an XML file. I say we have been doing this work, which is quite generous, for myself because I've only been here a few months and actually on the product side, most of this is the result of a lot of work by my colleague Sarah Bowman who many of you might know better.

00:05:56.000 --> 00:06:07.000
But I've taken on this work because she recently went on maternity leave. So just in case you're wondering why there's a new face talking about this.

00:06:07.000 --> 00:06:11.000
And on the tech side, chatter to Patrick Vale, I know you're watching.

00:06:11.000 --> 00:06:31.000
Okay, so anyway, what we're doing There is basically an extension of the concept behind the new grants registration form that we released I think towards the end of last year that some of you might have used before or seen Sarah demo in the past.

00:06:31.000 --> 00:06:37.000
So a lot of this might be familiar if you're. If you're familiar with that tool.

00:06:37.000 --> 00:06:41.000
So if you go to the next slide, please.

00:06:41.000 --> 00:06:52.000
Yeah, I just wanna acknowledge that. Some people at the audience might be thinking. they're already multiple ways of.

00:06:52.000 --> 00:07:04.000
Registering records especially journal articles and it's 10 points to you if you recognise all of these options I've represented on this slide and if you can name them all.

00:07:04.000 --> 00:07:10.000
We have a Trustee Old web deposit form that you can see at the top right. We have a metadata manager.

00:07:10.000 --> 00:07:24.000
We have a plugin for OJS for those that use that platform to manage their journal articles. We have an admin console where you can upload XML files if you're able to create them.

00:07:24.000 --> 00:07:30.000
And then of course that the option of you're able to create them. And then of course that the option of depositing XML and an automated way to our API.

00:07:30.000 --> 00:08:00.000
Using HTTPS post which is represented on the bottom right. And that is of course the most kind of reliable, available, efficient way of registering metadata, but we do know that today the majority of our members are either institutions or their small publishers and not everyone runs an XML first workflow and not everybody even has the technical resources to all know how to create XML in an automated way and deposit it.

00:08:07.000 --> 00:08:18.000
So we do want to offer these user interfaces that allow everyone to At least, perform all of the most essential tasks that you need to do to be a good cross-ref citizen.

00:08:18.000 --> 00:08:46.000
But the goal isn't and can't really be to represent absolutely everything that you could do with direct calls to our APIs through a user interface because If you're familiar with, metadata schema, the sheer complexity behind everything that you can in principle register with us then you'll understand that that's not really realistic to do in a form that people still have fighting chance of actually using.

00:08:46.000 --> 00:08:51.000
But that's just an aside. Getting off my, virtual, so books.

00:08:51.000 --> 00:09:01.000
I guess the main point of this slide is just to say. All of these different routes, all of these different roads ultimately lead to the same.

00:09:01.000 --> 00:09:08.000
Data base into the same API, so why? Would we bother creating yet another tool for depositing?

00:09:08.000 --> 00:09:20.000
If you go to the next slide. Actually, each one of the tools that we've built so far that work around the requirement to work with XML files directly.

00:09:20.000 --> 00:09:42.000
All of them serve their own little niche of use cases. And you have to be pretty familiar with the crossra tool ecosystem to kind of understand which one is the right one to use for which purpose and and each of our tools also has its own kind of specific limitations.

00:09:42.000 --> 00:09:52.000
For example, metadata manager is very complex technically and for that reason it has a number of bugs and issues that we're aware of that we've never been able to and fully address.

00:09:52.000 --> 00:10:00.000
The web deposit form is a little bit awkward to use and it doesn't cover all record types and so on and so on.

00:10:00.000 --> 00:10:10.000
And all of this together kind of leads to a confusing user experience for those members who are trying to deposit their metadata manually and it's also quite inefficient for us to have to maintain all these different tools in parallel.

00:10:10.000 --> 00:10:24.000
They're all the different code bases. If we make a change to a mitigator schema, for example, we have to then as a result of that make multiple change in multiple places to reflect that.

00:10:24.000 --> 00:10:42.000
In all of our tools. And so what we've been building Yeah, since, since we started with the G's registration club last year and now with this general article registration form is different because it is being built in.

00:10:42.000 --> 00:10:56.000
What we call a schema driven way. And by that, I just mean that with. We're creating the interface, the form that you actually enter the, into.

00:10:56.000 --> 00:11:05.000
In an indirect way from Let's get the schema itself. So take the schema as an input and then create a form out of it.

00:11:05.000 --> 00:11:11.000
Which means that it does a change to the schema that's much, much easier and much faster for us to reflect that in the actual interface.

00:11:11.000 --> 00:11:19.000
And it's also an interesting. Proof of concepts or the wider community to show that it is possible to build.

00:11:19.000 --> 00:11:29.000
Your own interfaces in that way. That suit your specific needs. Because as I already said earlier, we will likely never be able to build.

00:11:29.000 --> 00:11:42.000
The ultimate. One single tool that. Is usable that but that also allows you to do absolutely everything that you could do if you were talking to the XML API directly.

00:11:42.000 --> 00:11:50.000
And so the idea is that if we build a unified set of forms like the grounds, that, form and like now the new generalological form.

00:11:50.000 --> 00:12:01.000
Eventually we'll be able to. Replace some of the tool or tools that we know aren't really So serving your community in the way that they could.

00:12:01.000 --> 00:12:07.000
I'm talking specifically now about metadata manager which If you've used it lately, you've seen a little banner at the top.

00:12:07.000 --> 00:12:24.000
You'll know that we've been planning to deprecate that for a while now. And if you want to read more about the reasoning behind that decision and why we're doing what we're doing, then there's a blog post linked on this slide that Sarah wrote a little while ago called Next Steps Content Registration.

00:12:24.000 --> 00:12:35.000
I will also building all of our new tools in a way that they are, easily automatically translatable that's a process called localization.

00:12:35.000 --> 00:12:40.000
And that they can also be used by users with disabilities.

00:12:40.000 --> 00:12:47.000
Okay, so that's the preamble. If you go to the next slide. I just want to show you a few.

00:12:47.000 --> 00:13:03.000
Screenshots. Hope they're not too small on your screens. Just of the current state of the prototype of this film that we've been working on, it's still at a very early stage as Rachel kind of gave a little disclaimer at the beginning and a bit of a warning.

00:13:03.000 --> 00:13:07.000
But I just wanna show you, just give you a little bit of a flavor of what I'm talking about.

00:13:07.000 --> 00:13:19.000
So if you've used the grounds registration, in the past or if you've seen it, then a lot of the kind of design of this interface is going to look quite familiar to you.

00:13:19.000 --> 00:13:32.000
We're deliberately keeping it very simple, so. Sorry these screens aren't particularly shiny or telephone, but we're focusing on the most important the most widely used.

00:13:32.000 --> 00:13:47.000
Fields in the Mesopotamia for journal articles and just again, not every little thing, but the schema in theory allows for in the next MFL can be represented in a form without making it so complex that it doesn't work for anyone anymore.

00:13:47.000 --> 00:13:54.000
But we're focusing on the most important field just to make sure that we can. Meet the needs especially of those.

00:13:54.000 --> 00:14:02.000
Types of members who we know. Can't afford to automatically generate Excel and who know who need these kinds of tools.

00:14:02.000 --> 00:14:05.000
In terms of how the tool is used, we've conceived of it as a kind of wizard.

00:14:05.000 --> 00:14:13.000
And that guides you through the sequence of necessary steps one by one. You can see a little stepper at the top.

00:14:13.000 --> 00:14:25.000
And and also anytime that you enter something or you try to complete the steps, the tool will validate for you whether what you've just entered makes any sense or whether it fits what's allowed in the schema.

00:14:25.000 --> 00:14:36.000
That's just because of course. Any time you make a tool that's used by human beings and there's a capacity for people making typos a copied paste errors.

00:14:36.000 --> 00:14:51.000
Especially with such a Toysome task as registering. Dozens of general articles by hand if you've done it before I'm sure you've made some copy and paste errors and it's a lot more costly if you have to correct those after the fact so it's very important to.

00:14:51.000 --> 00:14:56.000
Not a date inputs early on, which is what we're doing. You can see an example of this.

00:14:56.000 --> 00:15:03.000
Actually, on the screenshot in red, towards the bottom because I've entered an invalid.

00:15:03.000 --> 00:15:13.000
And also if you're very keen observer, you might have seen at the top right there is a Language switcher, button, this is what I mentioned earlier about localization.

00:15:13.000 --> 00:15:27.000
A really important thing that we want to do with any of the new tools that we're building is to make sure that especially with those languages that we know a lot of our members are much more comfortable with than English, they can be used in those other languages.

00:15:27.000 --> 00:15:34.000
And the phone can also be navigated using your keyboard. So we're designing it to be.

00:15:34.000 --> 00:15:52.000
Accessible enough that it can be used by anyone in the community who actually need it. Yeah, so if you look at the stepper at the top, you can see that the form asks you first for information on the journal and then it goes down to the issue level and then finally the article itself that you're registering.

00:15:52.000 --> 00:16:05.000
And if you go to the next slide you'll see some of the issue screen. There's not gonna be that many surprises there for you in terms of the fields that are represented.

00:16:05.000 --> 00:16:13.000
You might have noticed already at the top that several titles can be entered both for the journal and for the issue and then also for the article.

00:16:13.000 --> 00:16:34.000
That's just because we support multiple languages for titles. Yeah, and like I said, we're focusing on the key metadata fields and those that are gonna be highest value for making a high quality record and also for the people who use their metadata to have a chance to then discover.

00:16:34.000 --> 00:16:41.000
That content. That's the issue and then if you go one slide further you'll see the article level.

00:16:41.000 --> 00:16:49.000
And you'll be able to see that in this example I've given the article. A couple of different titles and different languages.

00:16:49.000 --> 00:16:53.000
I don't speak somebody, so this had to do. And there's Also a roll lookup already built in.

00:16:53.000 --> 00:17:07.000
And if you look closely, you can see for the contributor. You can. If you start entering the affiliation.

00:17:07.000 --> 00:17:19.000
A tool will find the unique identification and the raw database for it. That's something that was quite a crowd pleasing in the Grant registration form.

00:17:19.000 --> 00:17:31.000
And there are more of those kinds of. I guess quality of life improvements and for using the tool that we know already we'll definitely need to or want to offer in the form.

00:17:31.000 --> 00:17:53.000
Once we actually have it go life for members to use. Just right now, not all of them are there because we've been focusing very much on getting a prototype into a state as quickly as possible that we can share it with all of you and with the wider community and get some feedback.

00:17:53.000 --> 00:18:04.000
If you go one more slide ahead, we'll go to the fourth step. At the end of the whole registration process, if again if you know the grants registration form you'll know what this looks like.

00:18:04.000 --> 00:18:11.000
We'll be able to give. The user adjacent file download of the record that they've just created.

00:18:11.000 --> 00:18:27.000
The XML actually doesn't visibly get involved at any point. That's just created in the background by the tool automatically and deposited in much the same way that it would be deposited if the user directly used our XML API.

00:18:27.000 --> 00:18:51.000
Let's see. This Jason files, so that's one another one of the things that I said we know will want to do because it makes a big difference in how useful the tool is is that We will allow for users to be able to load those Jason files back into the form later and then make changes, which is something that is already the case on the grounds registration form.

00:18:51.000 --> 00:18:58.000
Just for this very first prototype we're focusing just on the initial registration part so that we can test that.

00:18:58.000 --> 00:19:09.000
With the community. But we know that editing records after the fact, especially for general articles, is something that people use metadata manager for a lot.

00:19:09.000 --> 00:19:17.000
Yeah, so that's just a note on that. And actually on that topic, if you go to the.

00:19:17.000 --> 00:19:24.000
Next slide, or some people having issues, not seeing slides. I'm just gonna pretend I didn't read that and keep talking if it's working for most people.

00:19:24.000 --> 00:19:27.000
Yeah, I, I think we're okay, but we'll have a look.

00:19:27.000 --> 00:19:36.000
Okay. Yeah, okay, so. Since I've already used the word soon, Obviously, next questions are what's coming next?

00:19:36.000 --> 00:19:48.000
What are the kind of timelines? Like I said at the beginning, We weren't quite ready and confident enough to.

00:19:48.000 --> 00:19:55.000
To do this live yet and grow into the actual prototype and enter things and show you how it reacts.

00:19:55.000 --> 00:20:09.000
I took the safe route out of taking speed shop. Sorry about that. And we can't quite share a link to the prototype yet either for you to play around with yourselves for the same reason, but like I said, the goal is to get there as quickly as possible.

00:20:09.000 --> 00:20:21.000
So we're just putting some finishing touches on it. And it's again a very early version so don't expect to see it in production tomorrow just yet.

00:20:21.000 --> 00:20:47.000
But of course we'll keep. Our community updated as we make more progress. We're going to be iterating over the coming weeks and months on early initial feedback that we're going to be getting both internally and then from, from some key members of our community so I just actually I should say thanks at this point to Those of you who are across our ambassador who have already

00:20:47.000 --> 00:20:49.000
volunteered to. To help us test the prototype, we'll be getting in touch with you very soon.

00:20:49.000 --> 00:20:58.000
And there are also some people who volunteered at the community forum already in the path to be involved in this process, which is great.

00:20:58.000 --> 00:21:08.000
And then in terms of some future next steps. Again, if you've used metadata manager.

00:21:08.000 --> 00:21:16.000
Recently or not even not so recently you will have seen that a little banner and the T is going to be deprecated It still says 2023.

00:21:16.000 --> 00:21:22.000
Right now is the date of sunsetting it, but We don't want to rush this well.

00:21:22.000 --> 00:21:28.000
It's still needed because what we're building. Out as a minimum viable version of this new tool.

00:21:28.000 --> 00:21:37.000
That's what the MVP on the slide means. Minimum viable product for those. Who aren't so deep in the tech rabbit hole.

00:21:37.000 --> 00:21:50.000
Yeah, it's we don't wanna rush this on. I've put 2024 in here because that's probably more realistic, but the idea is that once we've got.

00:21:50.000 --> 00:22:02.000
Once we've a bit more formally reached out for community feedback like we always do was such an important project and we've iterated on that and we're getting confident that we have a viable version of this tool.

00:22:02.000 --> 00:22:11.000
We will be able to So the ghost. Of metadata manager I guess but of a Halloween theme there for you.

00:22:11.000 --> 00:22:35.000
To rest finally and and look to the future. So yeah, if you all. One of those people who really resonate with what I've been talking about and who maybe currently users metadata manager or even the web deposit form to register your journal article, then you want to help us shape this new thing then.

00:22:35.000 --> 00:22:40.000
Absolutely feel free to get in touch with me. I think the next slide will tell you how to do that.

00:22:40.000 --> 00:22:45.000
Or not? Okay, that was supposed to be a thank you slide with my email address on it.

00:22:45.000 --> 00:22:54.000
But, I'll make sure to to add that after the fact i guess Sorry about that.

00:22:54.000 --> 00:22:56.000
Yeah, we could put it into the chat as well.

00:22:56.000 --> 00:23:12.000
Yeah, it's also pretty easy to guess. I'll email addresses if you know. But yeah, but I also wanted to say that the, the community forum is always also a great place to be if you want to stay up to date with what everyone is doing at Crossraff.

00:23:12.000 --> 00:23:17.000
One of the first places that you'll find out when there's news or something to test or give input on.

00:23:17.000 --> 00:23:32.000
So, Shadow for the community forum and I'll put my email address in the chat look forward to speaking to some of you about this more and for now I think that's it for me.

00:23:32.000 --> 00:23:38.000
Oh, thanks, Jenny. Okay, third is already. Thanks. Thank you, Roger, I guess.

00:23:38.000 --> 00:23:42.000
Then there's a question for you in the Q&A.

00:23:42.000 --> 00:23:52.000
Yes, okay, we have. A question that says. Give him the driving cross traffic to get publishers to register reference nests.

00:23:52.000 --> 00:23:58.000
Will that mean option to upload reference lists either in their entire deeper passing or as a list of DIs only.

00:23:58.000 --> 00:24:10.000
That's a very good question. Of course, we know that references are a very very important metadata type and also one that we've been kind of.

00:24:10.000 --> 00:24:24.000
Pushing our members when we discuss their participation reports with them. All that kind of thing. To really consider investing in so we know that we want to be representing that in this in this tool in some way.

00:24:24.000 --> 00:24:33.000
We haven't really made a definitive decision yet on what the best way of doing that is. If there will be.

00:24:33.000 --> 00:24:46.000
Some way to use our simple text query that we tool that we have to to get DIs for references if you don't have them yet or if it's better to do them the other way around.

00:24:46.000 --> 00:24:52.000
I don't think that I would, I have a definitive answer on this yet, but that's exactly why we trying to, get this out for community testing and feedback as quickly as possible.

00:24:52.000 --> 00:25:02.000
So that we can know these sorts of things down. But that's the kind of thing that we don't want to make.

00:25:02.000 --> 00:25:16.000
A decision on to early on before we know what's actually most valuable. So if you think you have input on that then in touch with me for sure.

00:25:16.000 --> 00:25:23.000
And I think we let's, I can see there are other questions but I want to.

00:25:23.000 --> 00:25:24.000
Yeah.

00:25:24.000 --> 00:25:30.000
And I think some advice as well. I want to keep us moving and jump on to the next presentation just so that we don't run over time.

00:25:30.000 --> 00:25:35.000
But we can answer those in the Q&A as well and come back to them if we've got time at the end I would suggest.

00:25:35.000 --> 00:25:36.000
Is that all right?

00:25:36.000 --> 00:25:40.000
Perfect, thanks. Yeah, it's not ready question anyway, I think. Okay.

00:25:40.000 --> 00:25:55.000
Cool. Thank you. Okay, so, next up we have Eric Hampson from the Public Knowledge Project or PKP who's going to demonstrate the latest Crossrail plugin.

00:25:55.000 --> 00:25:57.000
. 3.4 the DIY workflow has been completely rewritten, addressing usability and issues and providing a better overall user experience.

00:25:57.000 --> 00:26:20.000
So, and providing a better overall user experience. So with so many of our members using OJS extensively, and these improvements are really timely and thanks to the team for sharing them with us and giving us the opportunity towards the end of, last year to test them out.

00:26:20.000 --> 00:26:22.000
I'm Eric, can we hand over to you?

00:26:22.000 --> 00:26:37.000
Yeah, thank you. Start. By sharing my screen if I can.

00:26:37.000 --> 00:26:51.000
Does not look like I am able to share my screen.

00:26:51.000 --> 00:26:57.000
Would you like to have another go?

00:26:57.000 --> 00:27:01.000
Yes.

00:27:01.000 --> 00:27:16.000
Oh no, and Perhaps really typical Zoom. Live demo contact side recently we set my computer and unfortunately do have 2 briefly log off of zoom and restart and I'm terribly sorry about this.

00:27:16.000 --> 00:27:24.000
Do you want to, what we can do is we can, we can tweak the order a little bit if you want to leave and then we join.

00:27:24.000 --> 00:27:25.000
Yeah, that sounds perfect.

00:27:25.000 --> 00:27:30.000
I think a couple of us have been hit with Zoom updates this morning, which is of course another, another variable in all of this.

00:27:30.000 --> 00:27:33.000
So we'll keep a look out for you, add you back in. And, and I think that will work.

00:27:33.000 --> 00:27:36.000
I'm Martin, Ripman, if you're happy to go next.

00:27:36.000 --> 00:27:40.000
Thanks.

00:27:40.000 --> 00:27:51.000
Yeah, no problem. I can jump in. So yeah, let me share my screen and hope that all the, to work.

00:27:51.000 --> 00:27:52.000
Yep, that works.

00:27:52.000 --> 00:27:56.000
So yeah, this is awesome. Good. And so yeah, this is a presentation about a new API endpoint that, that we have.

00:27:56.000 --> 00:28:09.000
And I'm going to demonstrate it via a Jupiter notebook. I realize this is in Python and some of you might be, I realize this is in Python and some of you might be, might be not familiar with Python and some of you might be might be not familiar with Python.

00:28:09.000 --> 00:28:23.000
Don't worry, I will explain and point out the most relevant bits as we as we go through, but basically this is looking at different ways that we can, that we can look across with metadata.

00:28:23.000 --> 00:28:27.000
So I'm gonna talk about relationships. Firstly, how you. I can get hold of relationships and, where, how we will combine those into one endpoint in the future.

00:28:27.000 --> 00:28:41.000
And so if you are familiar with Python, all I'm doing here is using the requests library to query APIs.

00:28:41.000 --> 00:28:49.000
Here's the endpoints. If you'd like to try this out, then, then you can go to, to this, this URL, which, which has the, the endpoints.

00:28:49.000 --> 00:28:58.000
And I'll give you a few more pointers at the end for that. I've also, I'm not running this off a co-lab note, but I have a co-lab notebook and I'll share the link for that at the end.

00:28:58.000 --> 00:29:04.000
As well. So yeah, we're just gonna start the notebook with a few useful functions, which are to show you the URL that we're querying.

00:29:04.000 --> 00:29:20.000
See that in a moment. And then this is this is the function that does the the work and it queries some URL and it expects a JSON output.

00:29:20.000 --> 00:29:26.000
So this is that Jason is the the format of the data that gets returned to our APIs. And so we're talking about just now very timely.

00:29:26.000 --> 00:29:41.000
References are very important and many of our members send us references. And what we can, what we, what we do at crossf is display those 2 a works endpoint.

00:29:41.000 --> 00:29:49.000
So each item has an entry, and you can query the works endpoint, using, using this, URL.

00:29:49.000 --> 00:30:02.000
So, you know, API that crossword. Slash works and then we have in in there this fill to which says does this item have references or not just give me the ones with references.

00:30:02.000 --> 00:30:07.000
And just to give you examples, we have, you know, 8 references for this DOI, 89 for this.

00:30:07.000 --> 00:30:13.000
I don't know what this this is, maybe it's a book or something but 424 references I feel sorry for the the poor production people who who have to go through all of those.

00:30:13.000 --> 00:30:25.000
What do these actually look like in, in JSON? Well, this is the, this is the JSON.

00:30:25.000 --> 00:30:44.000
List of references. I just pulled out the reference part of the response. And we can see that this DOI here is citing a number of, things and this is this is one of them that I'm highlighting here it has a DOI it also has the title the authors.

00:30:44.000 --> 00:30:52.000
And we've got this unstructured reference as well. You don't have to. Pass references to put them into our.

00:30:52.000 --> 00:30:59.000
To deposit them with with us. You can have just unstructured text like here. You don't even need a DI.

00:30:59.000 --> 00:31:06.000
We will try and match that for you. So you can see just kind of, you know, more examples with similar kind of information.

00:31:06.000 --> 00:31:17.000
So that's references. There's another way that you can add. Links between outputs in, slightly, more descriptive ways.

00:31:17.000 --> 00:31:32.000
And that's using the relationships. Part of our schema and, Again, I'm going to query the works endpoint, get the query the same endpoint instead of saying has references, says has has relations.

00:31:32.000 --> 00:31:35.000
Does it have a relationship and that's That's run nice and quickly, thankfully. It's always always risky doing a live demo anyway.

00:31:35.000 --> 00:31:46.000
Here's just one example. And we have here we have the type of relationship.

00:31:46.000 --> 00:31:57.000
Do we just pull the relationship? The Jason response. And then it says. It says what the relationship is too as well.

00:31:57.000 --> 00:32:04.000
And this is just an idea. I, an identifier says it's a DOI, you can add other types of identifier in there as well.

00:32:04.000 --> 00:32:13.000
And we say who it's asserted by. So subject means that it's asserted, asserted by the, the, the member that deposited this metadata.

00:32:13.000 --> 00:32:21.000
If you said object, then it would have been asserted by the member who deposited the metadata for the related item.

00:32:21.000 --> 00:32:28.000
So it's, you know, there's 2 sides to a relationship and One, this happens in life as well, but you know, one person says one thing, one person says another thing.

00:32:28.000 --> 00:32:35.000
And as you know, they don't always agree. And here's just a few more examples.

00:32:35.000 --> 00:32:45.000
So we have reviews, we have comments, and these are mostly reviews. This is one with a an article with a preprint.

00:32:45.000 --> 00:33:05.000
And we can see that, you know, some of these were asserted by the, the member who deposited the the object the metadata some of them were asserted by for example this one was asserted by the member who deposited the comment and said it comments on this and we match up in our as has already been mentioned we match up the reverse relationship here.

00:33:05.000 --> 00:33:12.000
So that's kind of nice, but you might notice that this output looks very different from what we saw before with the references, different kind of.

00:33:12.000 --> 00:33:26.000
Of metadata available. So I mentioned briefly that you can link to things outside of Crossf. We have a project called Event Data, which where we go and look for relationships.

00:33:26.000 --> 00:33:39.000
From crossraf diocese that are kind of you know mentioned around the web so for example on wikipedia hypothesis annotations, we look on on Reddit and just kind of blogs and websites.

00:33:39.000 --> 00:33:42.000
And, I'll just show you one example here. And we're looking at specifically at Wikipedia.

00:33:42.000 --> 00:33:57.000
And now the The eagle light among you might notice that, the event data API is being closed down from the end of next month.

00:33:57.000 --> 00:34:01.000
This is because we will continue to collect events. But we want to be able to, we want to focus on the new endpoint.

00:34:01.000 --> 00:34:06.000
The relationships endpoint. As you'll, which you'll see in, in a moment.

00:34:06.000 --> 00:34:31.000
And the best use of our resources is to Stop is to stop with event data put all of the event data into the relationships endpoint and make that usable for for the users that that are doing that so if you are using event data you want to look at the at the relationships end point.

00:34:31.000 --> 00:34:39.000
As, yeah, we're in the next few weeks. And so this is an example of an event and you'll see again it looks very different to the references.

00:34:39.000 --> 00:34:45.000
It looks different to the relationship. Metadata that we saw. But essentially we've got the same kind of information.

00:34:45.000 --> 00:34:48.000
So we got an identifier for the subject. In this case, it's a URL.

00:34:48.000 --> 00:34:56.000
Wikipedia page. We have an identifier for the object, which is a Crossworth DOI.

00:34:56.000 --> 00:35:09.000
And we say how these 2 things are linked together. And this is as a reference. So you've seen here this 3 different ways that we essentially deliver the same metadata.

00:35:09.000 --> 00:35:16.000
And you know, we've been looking, a great deal in, you know, in the past few months and years.

00:35:16.000 --> 00:35:24.000
The research nexus. And that doesn't really preference any kind of item or any kind of relationship.

00:35:24.000 --> 00:35:28.000
And so we said, well, why don't we reflect that in our APIs and provide all of the relationships that we know about.

00:35:28.000 --> 00:35:41.000
To a single endpoint and in the same format. And this is what we've been been building for a while now.

00:35:41.000 --> 00:35:48.000
And as I said at the beginning, this is the, this is the URL. And I'm just looking for, you know, a single day.

00:35:48.000 --> 00:36:00.000
So we've got, we've got a number of of filters, time filters for example on the on the output and you see on this day they were about 2.4 million relationships.

00:36:00.000 --> 00:36:19.000
I'm not going to show you all of those but this is just one of them. And Okay, we've we've tried to pair it down to the the minimum kind of useful information and we'd love feedback as to whether this is sufficient whether we need more, whether we need less and what would make this useful for for you to use.

00:36:19.000 --> 00:36:25.000
And so we have a subject. So this. Relationship comes from across F. DI.

00:36:25.000 --> 00:36:38.000
And in this case, It's going to, an organization. So this I can tell you because I've looked at this way too often is Wiley and it says that this DOI was published by Wiley essentially.

00:36:38.000 --> 00:36:46.000
And this information was asserted in instead of just saying subject to object or or a word, we're using a raw identifier for Cross W.

00:36:46.000 --> 00:36:56.000
So basically crossf is asserting that Wiley is, is permitted to, to manage the metadata record of of this DOI.

00:36:56.000 --> 00:37:16.000
So in the first instance, we are focusing on data citations which have been mentioned quite a number of times already so I'm really happy to have heard that it kind of Hopefully, yeah, I don't need to give too much motivation for why data citations are interesting and important.

00:37:16.000 --> 00:37:29.000
And and again we're going to query the relationships endpoint. We're going to look for things that have the type data set and we're just going to look for a single day and see what the what we find.

00:37:29.000 --> 00:37:37.000
And we find actually there were hundreds and 147 data citations. Or, found on this day.

00:37:37.000 --> 00:37:41.000
Now, this, what this means is subject type data set. This is with the capital D is the is used by data site.

00:37:41.000 --> 00:37:55.000
So these are actually not data citations deposited by Crossworth members. These are data citations of being deposited by data site members.

00:37:55.000 --> 00:38:06.000
They very kindly pass that information on to us and we're able to make it available. And through the through to our endpoints and you know we do that reciprocally as well.

00:38:06.000 --> 00:38:10.000
And let's just have a look at the the output there so we can see you know, there's various relationship types which are used.

00:38:10.000 --> 00:38:23.000
So, you know, this This, this. This TOI here, so this one here is is metadata for this crossf DI.

00:38:23.000 --> 00:38:35.000
This data site DOI is posted content which is so yes, this is a cross FDI, which is posted content.

00:38:35.000 --> 00:38:44.000
And, is cited by, this, data site DI which is in Zenodo and and so on and and so forth.

00:38:44.000 --> 00:39:00.000
So of course, I mean, I don't know whether doing a live presentation on this make make sense, but you know, the idea is that this is machine readable and if you want to look up more information about a DOI you can just go to our work standpoint and and retrieve the the metadata for that.

00:39:00.000 --> 00:39:08.000
This is really focused on the on the relationships. Looks like I should have, shortened the output for this one.

00:39:08.000 --> 00:39:13.000
Anyway, You can, this, this is what I'm not convinced is going to work because this query has been taking rather a long time at the moment.

00:39:13.000 --> 00:39:28.000
But you can query for everything which has been registered by data site. and then the output I've, I've done a short, kind of put the output here, but you know, you, this is, this is the output that you will see.

00:39:28.000 --> 00:39:38.000
So just to, to show you, you know, more kinds of queries that you can make.

00:39:38.000 --> 00:39:43.000
And, so I'm not quite sure how I'm doing for time here, but I carry on for

00:39:43.000 --> 00:39:52.000
Yeah, we can let people get to the end what else? In the but yeah just about a minute left be great

00:39:52.000 --> 00:40:10.000
Okay, fine. Yeah, so just very very quickly then you can you don't have to look at DOI you can look at orchids you can look at organizations you could this funding metadata in here so yeah, this is, as the orchid was an author of, of that.

00:40:10.000 --> 00:40:16.000
This, we can see that cargo is deposited 50,000, references and you know we've got funding information in here as well.

00:40:16.000 --> 00:40:27.000
Sorry to kind of blast through that very quickly but I will put this I will put this collab notebook link.

00:40:27.000 --> 00:40:34.000
Into the chat and you can have a look and have a look by yourself.

00:40:34.000 --> 00:40:39.000
Yeah, thanks. Thanks a lot for your attention.

00:40:39.000 --> 00:40:49.000
Awesome. Thank you very much. That was really interesting. There's the link to the notebook for anyone to have a look at.

00:40:49.000 --> 00:40:57.000
I am really conscious of time. Maybe one question. We want to take and then we can get to the other ones in the chat.

00:40:57.000 --> 00:41:04.000
Sure. Alright, perfect. There is one question maybe you can answer very quickly.

00:41:04.000 --> 00:41:18.000
Yeah, so yeah, is event data going to be available in relationships API from day one? And the answer is actually probably not, there will be a gap we have committed to making all of event data available.

00:41:18.000 --> 00:41:31.000
By the end of January, 2024. But we, you know, event data for those of you been using it has been unstable for quite some time and we've decided that it's better to focus on something new.

00:41:31.000 --> 00:41:44.000
And stable and there may be a gap in which some events are not available. We've really focused on data citation because we know that that's a very important use case and all of data site.

00:41:44.000 --> 00:41:56.000
All the data citations that we know about will be available. And from day one, the rest we will add, gradually over, over time.

00:41:56.000 --> 00:42:01.000
That's great. Thank you. Nissa, you can, there's a few other questions in the chat that you can pick up.

00:42:01.000 --> 00:42:02.000
Yeah, I love.

00:42:02.000 --> 00:42:07.000
Thank you. Eric, should we try again?

00:42:07.000 --> 00:42:10.000
Yes, fingers crossed. Thank you so much for moving things around for me.

00:42:10.000 --> 00:42:19.000
No worries.

00:42:19.000 --> 00:42:21.000
Yep, we can see your screen.

00:42:21.000 --> 00:42:34.000
Perfect. Okay, so jumping back. To a few minutes ago. For those of you who are unfamiliar with, open internet systems, or OJS that I get just a brief.

00:42:34.000 --> 00:42:40.000
Overview of what it is. It's an open source journal management and publishing platform.

00:42:40.000 --> 00:42:46.000
It encompasses, submission workflow, peer review, as well as the production workflow.

00:42:46.000 --> 00:43:10.000
And it's currently used by more than 30,000 journals of Y. And today I'd like to share with you a little bit about the new DOI workflow and the latest release 3.4 which came out in June ninth but in order to provide a little bit of context around that I'd first like to show you a little bit from the previous version, 3.3.

00:43:10.000 --> 00:43:19.000
Set up the why these changes are so big and helpful, especially for people who are managing lots and lots of DIs.

00:43:19.000 --> 00:43:26.000
So for those of you who are unfamiliar with OJS, this is the home screen you for.

00:43:26.000 --> 00:43:33.000
And an editor would log in. I just want to show an example of

00:43:33.000 --> 00:43:43.000
What's some of the pain points are when dealing with the, in the previous version. So to start with one of the major ones are around DIY creation.

00:43:43.000 --> 00:43:51.000
It's spread out throughout many different places throughout the application. So for example, if we were to this published article.

00:43:51.000 --> 00:44:00.000
Currently I only have article, DI is enabled, but I'll show an example of where all of those would go.

00:44:00.000 --> 00:44:10.000
So This is the kind of publication tab for. A published submission. The article DI would be under here.

00:44:10.000 --> 00:44:20.000
Under the identifiers tab. If you have a galley, it would be separately under here. And the issues would be yet under a different tab entirely.

00:44:20.000 --> 00:44:26.000
Another pain point that this brings about is you can see from this big red banner of this version has been published and cannot be edited.

00:44:26.000 --> 00:44:36.000
And this often happens that an article will be published. This is especially prominent in with preprints.

00:44:36.000 --> 00:44:53.000
Before a DI has necessarily been assigned to it. And this has previously been impossible. In OJS and encourages bad practices around things like unpublishing it to update metadata than republishing it.

00:44:53.000 --> 00:45:08.000
Highlight one other pain point here. Is around the previous DLI pattern and this is what we call the default UI suffix generation in OJS 3.3 and previous.

00:45:08.000 --> 00:45:17.000
If you come to assign the, you'll see this warning, which says you cannot generate a DI until this publication has been assigned to an issue.

00:45:17.000 --> 00:45:27.000
Just quickly show at a glance what that entails going to the DI settings for this version.

00:45:27.000 --> 00:45:35.000
And in OJS 3.3 and earlier we use a pattern based suffix generation, which does contain semantic information about the vitamin question.

00:45:35.000 --> 00:45:49.000
The most common one is for articles and it includes information about the issue. The volume number. IDs inherent to it from OJS itself.

00:45:49.000 --> 00:46:09.000
And this is often very problematic because DI cannot be assigned. 2 an item until it has been assigned to an issue and this is often very problematic because maybe an item shouldn't be assigned to an issue yet, but you would like to have a DI, like you shuffled around and you want to be able to assign it DOI, it's part of the pre-production.

00:46:09.000 --> 00:46:19.000
So. A little bit of background on some of the kind of pain points of DOI. Generation.

00:46:19.000 --> 00:46:29.000
Which brings us to. 3.4 and the same view here just as a starting point

00:46:29.000 --> 00:46:39.000
So I'll do a quick walkthrough of the process of setting things up as well as show some of the highlights of the new workflow.

00:46:39.000 --> 00:46:44.000
So first of all, you'll notice, you, as I featured very commonly on the sidebar here.

00:46:44.000 --> 00:46:55.000
And so one of the major changes is that DIs are now a core part of the application. Previously, they were a separate which kind of limited the amounts of integration.

00:46:55.000 --> 00:47:04.000
Integration into the application that was possible. Land, the Crossrail, and any of the other registration agency plugins are still.

00:47:04.000 --> 00:47:14.000
But this new architecture allows it to be more deeply integrated into the application and I hope as I go through some of that today you'll be able to see this new architecture allows it to be more deeply integrated into the application.

00:47:14.000 --> 00:47:15.000
And I hope as I go through some of that today, you'll be able to see some of that.

00:47:15.000 --> 00:47:21.000
To start with. Show you what some of the settings look like. Just to provide some context.

00:47:21.000 --> 00:47:29.000
DIs can be enabled or disabled. Part of the distribution workflow. Globally here.

00:47:29.000 --> 00:47:38.000
Next up, it's very similar to previous versions. Items with DIs. Apparently I only have articles selected.

00:47:38.000 --> 00:47:49.000
This also varies across the different, software applications that PKP makes for one for monographs and one for preprints but it's roughly the same idea.

00:47:49.000 --> 00:47:57.000
One new thing here in particular is that by default, anything that OJS can create a DI for is possible here.

00:47:57.000 --> 00:48:05.000
Now is the article, and this is usually in the most default context that published PDF.

00:48:05.000 --> 00:48:16.000
Crossra for the Crossraff plugin, this option is disabled. And this can be configured for other registration agencies if say they didn't take issue DIs for instance.

00:48:16.000 --> 00:48:27.000
And potentially more applicable is for monographs if different services do or do not take chapter, or that don't have the capability to distinguish between those types of things.

00:48:27.000 --> 00:48:30.000
And I'll show what that looks like here.

00:48:30.000 --> 00:48:44.000
Next big thing is the automatic. DI assignment. The philosophy around this was just to be able to generate DIs as early as possible so that there as part of the production process part of the.

00:48:44.000 --> 00:48:55.000
Layout process. It's going to be done. On reaching the copy editing stage, which essentially means as early as possible.

00:48:55.000 --> 00:49:06.000
On publication or never, in other words, manually. And when we looked at the pain point from OJS 3 3.

00:49:06.000 --> 00:49:09.000
What that

00:49:09.000 --> 00:49:16.000
Kind of has bearing on for this is that you needed to have assigned things to an issue. That is no longer the case.

00:49:16.000 --> 00:49:28.000
The default suffix generation now creates. Hey, unique character suffix. And this is a pattern we from inspiration from.

00:49:28.000 --> 00:49:38.000
A data site tool. That would create an 8 character suffix and it includes a check some digit so it can be checked programmatically to make sure there are no typos.

00:49:38.000 --> 00:49:47.000
In a similar way to ISSNs or ISPNs. So one of the other big wins is because it's a unique character suffix.

00:49:47.000 --> 00:49:56.000
It does not contain any semantic information and there is no temptation for editors or journal managers to want to.

00:49:56.000 --> 00:50:08.000
The US to convey any sort of cementing information that they might want to change later, for instance, if the name of the journal changes, they won't want to change their DIY suffix because it doesn't have any information about that to begin with.

00:50:08.000 --> 00:50:20.000
Next bit is. All information about the registration agencies has also been co-located here. So the name of the game with a lot of these changes is co-location.

00:50:20.000 --> 00:50:25.000
So wanting all the settings to be co-located as well as the actual deal I mentioned itself.

00:50:25.000 --> 00:50:38.000
So we go here. We can check which. Registration agency we'd like to Rich registration agency. Right now I have Crossf set up.

00:50:38.000 --> 00:50:48.000
That makes the most sense. But all of these settings are co located here. Automatic deposit is still functional.

00:50:48.000 --> 00:50:59.000
And all of the settings are here along with some added context about. The username and the roles for setting those up.

00:50:59.000 --> 00:51:16.000
Go ahead and enable that. And then move on to the new DIY manager interface. So this interface here will be more or less the same regardless of which registration agency is used if a registration agency plug-in is enabled at all.

00:51:16.000 --> 00:51:22.000
And this will be the same across all of the applications. So this was really done in an effort to.

00:51:22.000 --> 00:51:34.000
Unify the way that journal managers and editors can work with the UIs. So a few things I'd like to highlight around this.

00:51:34.000 --> 00:51:43.000
Are the basics of DOI assignment as well as some of the quality of life things. So some of the first things that jump out are the red badges here.

00:51:43.000 --> 00:51:46.000
These will usually Show the top priority things that the journal manager would want to see when they come to the screen.

00:51:46.000 --> 00:52:01.000
And this includes things like whether items still need to be assigned to DIs. Once items have been published, whether they are unregistered.

00:52:01.000 --> 00:52:08.000
As well as if there are any errors in the registration process. So we can filter things based on.

00:52:08.000 --> 00:52:16.000
These types of things. So you want to see all of the items that need to do your eyes.

00:52:16.000 --> 00:52:21.000
That includes everything because nothing has DOI assigned at this time.

00:52:21.000 --> 00:52:30.000
There is an explanation for what all of these. Individual statuses mean as well as the filters. That can be updated.

00:52:30.000 --> 00:52:42.000
That can be viewed if you are looking here at a later time. Also possible to filtered by. Issues and if you're using an issue based workflow can manage everything.

00:52:42.000 --> 00:52:56.000
There. One of the other big improvements in terms of the workflow for time. I didn't show an example of this in the pain point section in 3.3, but the ability to do bulk actions.

00:52:56.000 --> 00:53:04.000
So. To paint a picture of it, imagine you had 300 items that you needed to assign. Or reregister.

00:53:04.000 --> 00:53:13.000
Previously you had to tick all of these manually and this was just a quirk of the design of the forms.

00:53:13.000 --> 00:53:20.000
You know, have both actions and this is where the heart of a lot of the functionality is. We filtered for the issue we want.

00:53:20.000 --> 00:53:27.000
We selected all and now we'd like to assign some DIs to these.

00:53:27.000 --> 00:53:33.000
And we can now see that the status of the, from, from, needs to, to unregistered.

00:53:33.000 --> 00:53:44.000
Do I's can be edited under this expanded view if you have more than one DUI for an item, if you are using, or if you're working with monographs and have more.

00:53:44.000 --> 00:53:52.000
Types of items that have DUIs, those can all be managed here. It can be edited here as well.

00:53:52.000 --> 00:54:01.000
And once you're ready, it can be deposited. So it also like to just take a moment to highlight this as the.

00:54:01.000 --> 00:54:11.000
New type of suffix. It is still more or less human readable and trying to strike a balance between it being easy to.

00:54:11.000 --> 00:54:16.000
Share and read without having

00:54:16.000 --> 00:54:25.000
Without being too long. And still mutating enough, unique. Possibilities. So the current pattern we have.

00:54:25.000 --> 00:54:34.000
There's a bit. Hi, I have 1 billion different unique. Combinations that are possible per prefix.

00:54:34.000 --> 00:54:37.000
On the off chance that we do run into duplicate issues, a new sub fix can always just be generated for that item.

00:54:37.000 --> 00:54:49.000
Although that's quite unlikely.

00:54:49.000 --> 00:54:55.000
Before showing the deposit workflow. I want to show you how the automatic workflow works.

00:54:55.000 --> 00:54:59.000
So I gave a brief.

00:54:59.000 --> 00:55:09.000
And example of that. The previous one. I will show it. This as well.

00:55:09.000 --> 00:55:20.000
So take an item. Here that has not yet been through the review process, but will pretend. For moments that this has been accepted.

00:55:20.000 --> 00:55:23.000
For publication.

00:55:23.000 --> 00:55:33.000
Record that submission and now it is moved into the copy editing. And if you recall, we've now set up DIs to be assigned as soon as an item reach copy editing stage.

00:55:33.000 --> 00:55:41.000
So if we go over to the Do I's tab that loads up. We should now see that this item that we were just looking at now has a GLI sign.

00:55:41.000 --> 00:55:50.000
This is something that was not previously possible. And ends up causing, a lot more headaches than it might initially appear not having.

00:55:50.000 --> 00:56:04.000
DI assignment tied to any semantic information about OJS metadata is a huge way for us and consequently It's fewer problems downstream for registration agencies like Crossf.

00:56:04.000 --> 00:56:21.000
As a final that I'd like to just show. The register. Registration process. One other pain point from previous versions was that when you were trying to submit multiple items to be registered.

00:56:21.000 --> 00:56:34.000
They would often time out and it was not always clear. What the threshold for that was. The registration process has been reworked to use a jobs based system.

00:56:34.000 --> 00:56:42.000
So for each of these items that we want to. Deposit. They will now be queued in a jobs queue.

00:56:42.000 --> 00:56:50.000
And so if we go here, what that means, go to deposit them. UI feedback should be instant and they've been queued for deposit.

00:56:50.000 --> 00:56:58.000
And the XML generation and all of that slow process will happen one at a time. And as those are updated, we refresh the page.

00:56:58.000 --> 00:57:07.000
We should see. Once those processes have finished that.

00:57:07.000 --> 00:57:18.000
The deposit will be successful. I won't make you watch that because that is a very long process sometimes, which is the whole point of moving it into a background job.

00:57:18.000 --> 00:57:27.000
And I believe that is all that I wanted to share with you today.

00:57:27.000 --> 00:57:34.000
Thank you so much.

00:57:34.000 --> 00:57:45.000
Great. Thanks, Eric. We got it ruling in the end. So I think there's a question that's just come in.

00:57:45.000 --> 00:57:48.000
Yes.

00:57:48.000 --> 00:57:50.000
From Tom. So yeah, this is like the automatic assignment.

00:57:50.000 --> 00:58:01.000
Yes. Yes. The way it works is copied the copy editing stage would be the first opportunity.

00:58:01.000 --> 00:58:14.000
For things to be assigned. When it moves to the production stage, it could also be assigned. So the idea is that This accommodates workflows where, editors and journals.

00:58:14.000 --> 00:58:16.000
Either use or do not use the copy editing stage but one way or another it will have to go, either the copy editing stage or the production stage.

00:58:16.000 --> 00:58:29.000
Before being published. So. The automatic assignment will it will attempt the automatic assignment at both of those stages.

00:58:29.000 --> 00:58:34.000
And this is also true if the legacy pattern generation is also still in place because obviously that would require an issue to be assigned.

00:58:34.000 --> 00:58:45.000
So it will first try at the copier editing stage. If it skips that, it will try at the production stage and if it gets that.

00:58:45.000 --> 00:58:53.000
Or if it's still not possible, it will try again on publication. So it will try as many times as possible, but we'll rewrite.

00:58:53.000 --> 00:58:56.000
Okay, that answers your question.

00:58:56.000 --> 00:59:13.000
Awesome, thank you. I'm conscious of time. There's another question that's just come into the chat and I think probably my question is around just where people can get information on upgrading to OJS 3.4 so they can take advantage.

00:59:13.000 --> 00:59:14.000
A lot of improvements I think will just really help kind of ease a lot of those kind of pinpoints.

00:59:14.000 --> 00:59:22.000
So yeah, appreciate it. Thank you.

00:59:22.000 --> 00:59:31.000
Yeah, absolutely. I'll put my email if anyone has any questions and I will also link out to the release posts about OJS. 3.4.

00:59:31.000 --> 00:59:33.000
I'll add that to the chat.

00:59:33.000 --> 00:59:46.000
Excellent. Thank you. So at this point I'm gonna hand over to, Luis Montier who's been

00:59:46.000 --> 00:59:54.000
Who's been leading things on the technical side.

00:59:54.000 --> 00:59:57.000
Sure. Oh, can you share my slides for me, please?

00:59:57.000 --> 01:00:00.000
Yes, I can indeed. Let me just grab those.

01:00:00.000 --> 01:00:23.000
Thank you.

01:00:23.000 --> 01:00:47.000
Oh, sorry, 2 s.

01:00:47.000 --> 01:01:10.000
Sorry, I've got these.

01:01:10.000 --> 01:01:18.000
Okay.

01:01:18.000 --> 01:01:32.000
I'm sorry, I just need to jump through to your slides.

01:01:32.000 --> 01:01:36.000
Yeah, if you want to go ahead and introduce your session then I will I will catch up with you.

01:01:36.000 --> 01:01:43.000
Sure, yeah, let's start. Perfect. Thank you very much. Well, thank you, everybody, for being here.

01:01:43.000 --> 01:01:54.000
My name is Liz Montilla. I'm the technical community manager at Crosscraft. And today I will be talking about, well, giving up a very basic introduction through the Crossraff API.

01:01:54.000 --> 01:02:14.000
Which we would start in the minutes. Next one, yes, perfect. Next one, please. So this presentation will be about Api's, but more importantly, it will be about the possibilities that the community has to interact with the best network of relationships between research objects.

01:02:14.000 --> 01:02:21.000
Which is what we call the research In more practical terms, we are talking about metadata.

01:02:21.000 --> 01:02:32.000
Which are the basically the descriptors of research objects and how we can establish relationships between them. Next one, please.

01:02:32.000 --> 01:02:38.000
So, we make this metadata open available via our APIs and our public data files.

01:02:38.000 --> 01:02:49.000
Enabling the, paper machines to incorporated into their research tools and services. And at the same time, we collect and we operate distribute this metadata.

01:02:49.000 --> 01:02:58.000
As part of the ever growing research Nexus. It's safe to assume that not everybody understands what an API actually means.

01:02:58.000 --> 01:03:05.000
They're for the purpose of this presentation it's to provide a quick introduction to the use of the Krosrov API.

01:03:05.000 --> 01:03:13.000
So if we see the slide, we see then, an API is an software intermediary that allows 2 applications to talk to each other.

01:03:13.000 --> 01:03:25.000
And Api's are often compared to a weather service facilitating communication between customers or during items from a menu and the kitchen producing those items.

01:03:25.000 --> 01:03:30.000
Next one, please.

01:03:30.000 --> 01:03:37.000
So we make this metadata open through our REST API. You can visit the URL to access or documentation.

01:03:37.000 --> 01:03:45.000
I also shared the link in the chat. And the presentation before. And then you can use this to perform chorus directly from your web browser.

01:03:45.000 --> 01:03:55.000
There is no registration required. Cross riff once again it's committed to providing an open and anonymous as possible access to scholarly metadata.

01:03:55.000 --> 01:04:03.000
And keep this your hand for the next examples, please.

01:04:03.000 --> 01:04:11.000
However, we do have some recommendations and a ticket suggestions to follow if you wish to retrieve metadata.

01:04:11.000 --> 01:04:23.000
First by adding your email address to your request, we can contact you in case of issues. Then additionally, we, redirect this request to an a specific polite.

01:04:23.000 --> 01:04:31.000
A pool of servers. These servers are generally more reliable because we can more easily protect them from misbehaving scripts.

01:04:31.000 --> 01:04:40.000
In contrast with full anonymous requests. And then there is a third alternative, especially if you are using our REST API for a production service.

01:04:40.000 --> 01:04:53.000
That require high predictability. And this option is to consider using our our paid plus service, which provides you with an authentication talking that directs you to your request.

01:04:53.000 --> 01:05:03.000
To a pool of service that are extremely predictable. Next one, please. So for the specific examples I'm going to show you next, I'm going to use Postman, which is an API client.

01:05:03.000 --> 01:05:15.000
We don't endorse this app, but we acknowledged that it is widely used and it provides a set of functionalities that makes metadata retrieval much easier.

01:05:15.000 --> 01:05:26.000
However, you can also execute these words directly in your web browser once again. In most cases, your browser will display this for the results as in adjacent format.

01:05:26.000 --> 01:05:32.000
However in some instances you might need to install a browser extension.

01:05:32.000 --> 01:05:40.000
Next one, please. So before moving on, I would like to take a second to establish some basic vocabulary.

01:05:40.000 --> 01:05:48.000
They will. That you will encounter throughout these presentations. So the text that you see here that at the top represents a query.

01:05:48.000 --> 01:05:55.000
The first part that you see colored in yellow indicates the server. So this identifies the entity providing the service.

01:05:55.000 --> 01:06:00.000
And all of our crossraf related queries we share these common root. Then you have the endpoints.

01:06:00.000 --> 01:06:13.000
Or this is a key term that we will use to identify the digital locations that receive a connection. And I will show you soon the other rank points that we have available to the general community.

01:06:13.000 --> 01:06:20.000
And then finally we have the parameters. You can identify them as being preceded by a question mark. And then they follow the notation of having a key.

01:06:20.000 --> 01:06:31.000
And, value. So for example, in this specific example, you can see a query that has the mail to parameter.

01:06:31.000 --> 01:06:42.000
And in this part basically allow you to specify resources. You are retreating and also to perform all directions such as in this case identifying yourself.

01:06:42.000 --> 01:06:53.000
Next one, please. So if we check the documentation, we'll find that list of different locations for the endpoint that our API contains that of interest.

01:06:53.000 --> 01:06:59.000
In this presentation, I'll show you some basic examples of the data that you can retrieve using some of these.

01:06:59.000 --> 01:07:11.000
Perhaps you know this in the previous slide that I was using the funders endpoint. But we have endpoints specifically for journals, works, grassroots members and several others.

01:07:11.000 --> 01:07:15.000
That's one place.

01:07:15.000 --> 01:07:22.000
Now let's explore some basic examples. We'll start with, funder 2 article relationships.

01:07:22.000 --> 01:07:29.000
This is interesting because funders do not automatically know when the work that they have funded is polished. And it but it's also essential for reporting.

01:07:29.000 --> 01:07:46.000
On the impact of grants. And then finding this information can be challenging through other means. Because offframe, publishers, customers and institution institutions do not systematically report it.

01:07:46.000 --> 01:07:50.000
Next one, please. I'm going to, this is not going to be a live demo.

01:07:50.000 --> 01:08:04.000
This is going to be through, done through screenshots. So as you can see here, we include in the top part the string that contains our server the endpoint that I was mentioning before the funders endpoint.

01:08:04.000 --> 01:08:18.000
And other parameters. In this specific case, I'd like to start with a very general search. So I include the mail chief parameter to make a product request and I add a query with the text German Research Foundation.

01:08:18.000 --> 01:08:27.000
Next one, please. So when you, submit this query, you will receive best, status code.

01:08:27.000 --> 01:08:35.000
In this case, everything, it's okay. And then we luckily we can see how many items we are retrieving with this square, which I'm highlighting here.

01:08:35.000 --> 01:08:52.000
You can see that we are withdrawing 4 results. Next one, please. So if we examine, hopefully it's very, they have a good resolution, but if we examine the output in detail, we will notice that our per peers as part of the field alternative names.

01:08:52.000 --> 01:08:59.000
And organization that we are looking for appears on top. And then from here, we can take notice of the ID number.

01:08:59.000 --> 01:09:05.000
Then you can use this to refine our queries. Next one, please.

01:09:05.000 --> 01:09:11.000
So for example, if we use this ID as part of the an additional path level in our funders and point.

01:09:11.000 --> 01:09:25.000
We can retreat organization specific data. For example, here we can see that almost 200,000 works have the the GRF as part of their funding organization.

01:09:25.000 --> 01:09:35.000
Next one, please. Thank you. So we can take a few seconds in case you want to try this or of course if you're watching the recording you can post their presentation.

01:09:35.000 --> 01:09:46.000
So please visit this URL. You can scroll down to the funders section. Click get, please remember to add your image to the major field.

01:09:46.000 --> 01:10:00.000
I'm finally included of any funding organization that you wish in the Let's have a few seconds.

01:10:00.000 --> 01:10:04.000
Okay, next one, please.

01:10:04.000 --> 01:10:09.000
So this screenshot shows one possible result. Please notice that you should get the Status code 200 indicating that your query proceeded correctly.

01:10:09.000 --> 01:10:21.000
And what it's called here to response body contains your data.

01:10:21.000 --> 01:10:29.000
Next one, please. If you're pasting these squares directly in your web browser, at the search war, it will likely look like this.

01:10:29.000 --> 01:10:36.000
The latest versions of Firefox and Chrome and several others should let you visualize this Jason file.

01:10:36.000 --> 01:10:47.000
I was doing some test in Safari and then it was one the one required to selling an extension. Consider this.

01:10:47.000 --> 01:10:53.000
Next one. Thank you. So we can pro this output a little bit more by adding additional, field queries.

01:10:53.000 --> 01:11:04.000
Once again, these are listed in our documentation. Some of these accept text rings, some others are numbers, others are 1 billion expressions, meaning true or false.

01:11:04.000 --> 01:11:09.000
And of course they can be combined different ways. Next one, please.

01:11:09.000 --> 01:11:19.000
So for example here, let's imagine that we are interested in knowing how many of the selected organizations funded journal articles are publicly available.

01:11:19.000 --> 01:11:27.000
So in this case, we can add one filter. That includes the type element. In this case, we are setting these 2 journal particle.

01:11:27.000 --> 01:11:37.000
And we are also adding the filter that asking if this contains a full text element and we are setting this to false.

01:11:37.000 --> 01:11:51.000
Next one, please. So you can try combining some of this. I'm showing you here the example that I just used.

01:11:51.000 --> 01:12:07.000
And then, of course, we can combine these fields more. So for example, we can also explore, we can query the specific the grants that is available in the in the work lists.

01:12:07.000 --> 01:12:23.000
Next one, please. We can also facet, aggregate data using the facet fields. In this case, we can use this, for example, to aggregate results in terms of the publication year.

01:12:23.000 --> 01:12:34.000
Next one. And here I'm showing the example in case you want to replicate this examples.

01:12:34.000 --> 01:12:41.000
And what with this slide that will finish my presentation, feel free to contact me. I will add my details in the chat.

01:12:41.000 --> 01:12:54.000
And of this presentation was kind enough to anyone who was not aware of how to use an API before. Thank you very much.

01:12:54.000 --> 01:13:07.000
We, that was excellent. Thank you very much. And again, just to reiterate, we'll share the slides and the recordings so that you can you can dig into this more.

01:13:07.000 --> 01:13:22.000
Louise joined the the team at Crossraff a couple of months ago. And so if you enjoyed that demo, and there'll be loads more to come in future and just help people get to to grips with how to use the the crossraf metadata.

01:13:22.000 --> 01:13:31.000
I'm Game questions in the charts or in the Q&A and we'll take those up In the meantime, I'm going to keep screen sharing.

01:13:31.000 --> 01:13:48.000
And I'm going to talk just really quickly about the the work that we've been doing with the with the data that's come from retraction watch and Martin Eve is going to help me out whenever I get.

01:13:48.000 --> 01:13:53.000
But it as in when I get stuck. The

01:13:53.000 --> 01:14:04.000
Just for context in case you weren't aware, Crossraff acquired and opened the retraction watch data in the middle of September.

01:14:04.000 --> 01:14:12.000
And I've linked to the blog posts that that describes that in more detail. The other thing I will do as well is for detail.

01:14:12.000 --> 01:14:22.000
The other thing I will do as well is for the context of. The other thing I will do as well is for the context of why this information is important, etc.

01:14:22.000 --> 01:14:30.000
Last month as well. So we'll share those which will give you lots more context than we have time to to do today.

01:14:30.000 --> 01:14:34.000
There are lots of things that we know and expect our community to do with more comprehensive complete information on retractions.

01:14:34.000 --> 01:14:56.000
And so We wanted to make the data available. A, you know, in a very simple CSV format, but also via our labs API so that people can start to get to groups with it early doors.

01:14:56.000 --> 01:15:12.000
So As I said, as well as the. Csv format. We made the, our more more accurately Martin made the data available via our Labs API.

01:15:12.000 --> 01:15:20.000
What I mean when I say the labs API is that it's a it's an environment where we test stuff out.

01:15:20.000 --> 01:15:26.000
And adding new data, a new representations of existing data. And I think the main message of this presentation is it's where we'd love some feedback on how we are displaying.

01:15:26.000 --> 01:15:39.000
And representing the the data. There's a little snapshot of it over to the the right hand side of this, this slide.

01:15:39.000 --> 01:16:01.000
As Louise mentioned, we asked you to use a meal too, so your email to, when you're accessing the the data and the information is available as you can see via the via the works route in the in the labs API.

01:16:01.000 --> 01:16:02.000
Pulled some of the information out just to try to make it a little bit clearer. Martin, I'm happy again.

01:16:02.000 --> 01:16:18.000
You can talk to the pieces that I've missed. But what sort of jumped out to me is someone who uses and is really interested in this is that we've got a couple of fields here.

01:16:18.000 --> 01:16:27.000
So we've got the the roller ID of the Center for Scientific Integrity, who are the, the folks behind the retraction watch data.

01:16:27.000 --> 01:16:35.000
So I think that's a that's a a nice thing because again we've got rural identifiers we want to be able to use those 2 uniquely and accurately identify institutions.

01:16:35.000 --> 01:16:52.000
We've got We're trying to be really clear about who made the, about who made the assertion that an article has been retracted because in many instances we've got, attraction being asserted by retraction watch.

01:16:52.000 --> 01:17:04.000
That isn't currently reflected in the publisher metadata. And again, of course, we really hope and expect that that will that that will change over time.

01:17:04.000 --> 01:17:18.000
And we're thinking about how we model retractions, full stop. So said Martin implemented the workflow and crunched the retraction watch data to make it available via these routes.

01:17:18.000 --> 01:17:28.000
So thank you very much. But do you want to say more about the current implementation and the plans going ahead?

01:17:28.000 --> 01:17:35.000
Sure, so there are some limitations around what we've got at the moment. So for instance, it's not easy to do filters in our labs API system.

01:17:35.000 --> 01:17:47.000
So this should not be at this point a full replacement for the production API and and you shouldn't expect the same same stability or performance.

01:17:47.000 --> 01:17:54.000
What it does do though is let us test whether the information that we're representing is in a format that's of use to our community.

01:17:54.000 --> 01:18:03.000
So what I would urge is if you are someone who's been making use of their attraction, watch data and you're thinking about using it in an API format.

01:18:03.000 --> 01:18:18.000
If you could have a go at using the Labs API and seeing whether it meets your needs and giving us feedback on what's missing, what you need, what's there, what's good, that would all be incredibly helpful for us in thinking about how we move this through to production.

01:18:18.000 --> 01:18:23.000
At a later stage.

01:18:23.000 --> 01:18:28.000
Is that enough, Rachel? I'm not sure what else I can add there at this point.

01:18:28.000 --> 01:18:29.000
Okay.

01:18:29.000 --> 01:18:39.000
I think that I think that that's more than enough and we will again will pass on information on how you can get in touch with us about that and I know that lots of people have already so I think that's really important.

01:18:39.000 --> 01:18:57.000
And again, please feel free to jump in to the chat or Q&A and ask us questions about the data and we'll also post a link to the the webinar from a couple of weeks ago to do that.

01:18:57.000 --> 01:19:03.000
So we just want to flag this off. This is an area where we're really king for, for feedback.

01:19:03.000 --> 01:19:10.000
So, So, so do go ahead and give us that. I am.

01:19:10.000 --> 01:19:11.000
I am going to, so Martin, I'm gonna leave you to your own devices for the next demo.

01:19:11.000 --> 01:19:30.000
So, so you've been doing work to analyze the present preservation status of content registered by Crossraff members so I'll leave it to you to share your analysis and your thinking around that, LY.

01:19:30.000 --> 01:19:31.000
Thank you.

01:19:31.000 --> 01:19:41.000
Thank you. Very much. Just bear with me a second while I try to work out. I'm gonna share the correct monitor or whether I'm gonna show one all my emails.

01:19:41.000 --> 01:19:49.000
Let's try not one. Have you got a presentation there, Rachel or.

01:19:49.000 --> 01:19:51.000
Present presentation and no emails. So perfect.

01:19:51.000 --> 01:19:56.000
That's, that was good guess then on, on the one of 3, okay. Thanks, everyone.

01:19:56.000 --> 01:20:05.000
So I'm gonna talk today a little bit about the state of digital preservation. And a study that we conducted across 7 million DOIs.

01:20:05.000 --> 01:20:15.000
This is a really important aspect of crossrest practice that sometimes goes under remarked upon, which is that for Crossraff to provide a stable linking service between different platforms, we need to know that.

01:20:15.000 --> 01:20:24.000
Material has been safely digitally preserved. Why is that? Because if the original source goes offline, Where's the DIY going to end up pointing?

01:20:24.000 --> 01:20:36.000
If there's no digital preservation source to which we can redirect DOI. Then actually the persistence of our URLs is limited to the lifetime of an original.

01:20:36.000 --> 01:20:40.000
HTTP endpoint.

01:20:40.000 --> 01:20:49.000
So when you join Crossref, one of the conditions membership is that you will make best efforts to ensure that your material is safely, digitally preserved.

01:20:49.000 --> 01:21:02.000
But we don't know what proportion of scholarly objects assigned to DIY are actually adequately preserved in a recognized dark archiving system like clocks, locks, portico or even the internet archive.

01:21:02.000 --> 01:21:05.000
We don't really know how stable those preservation systems that underpin the persistence of these persistent identifiers is at the moment.

01:21:05.000 --> 01:21:18.000
We haven't yet had a mass extinction event of of DOIs or anything that has really you caused us to use those systems on mass.

01:21:18.000 --> 01:21:25.000
We don't know which actors of our membership are behaving well on this theme. Who does dish to preservation well and who does it badly?

01:21:25.000 --> 01:21:31.000
We don't even really know in the area of this door who should be doing this. Is it libraries?

01:21:31.000 --> 01:21:44.000
Is it publishers? Has that changed since the print era? Should it be both of those parties? And most importantly, perhaps, with a slightly apocalyptic tone again, is it already too late?

01:21:44.000 --> 01:21:56.000
Is vastly the material that is not preserved that we're now going to struggle to get into some preservation mechanism at this late stage in the day.

01:21:56.000 --> 01:22:08.000
So I said about building an item level preservation database system that would allow us to iterate over a sample of DOIs and to figure out where these things are preserved.

01:22:08.000 --> 01:22:17.000
There's an existing system called the Keepers Red Street that performs a similar function, but it doesn't have an API with open use that we could just use to query at scale.

01:22:17.000 --> 01:22:29.000
A large number of DOIs, it also looks at the container level. So it will tell you this issue and this volume of this journal are preserved, not whether this DOI is preserved.

01:22:29.000 --> 01:22:37.000
So I had to build something that would translate between those layers. But I built a system that incorporates the archives that you can see on this slideshare.

01:22:37.000 --> 01:22:48.000
So Kari and Iana, which is predominantly southern Latin American material clocks, control lots of copies keep stuff safe archive.

01:22:48.000 --> 01:23:02.000
Happy trust, internet archive, locks. The more general version of clocks. PKPs private locks network which is a system at PKP implemented to help open journal system users ensure their material is safe.

01:23:02.000 --> 01:23:10.000
At Portico, another dark archive. And the Oakhall Scholars portal in Canada.

01:23:10.000 --> 01:23:17.000
I also needed some way of figuring out how to score members and how they're doing on digital preservation.

01:23:17.000 --> 01:23:23.000
I mean, it's all very well to just say this is what's going on, but you need some way of ranking them and understanding the categorization.

01:23:23.000 --> 01:23:33.000
So I invented a scale because there is no standardized scale for digital preservation. Where, I gave gold medals to those that have 75% of their content digitally preserved in 3 or more recognised archives.

01:23:33.000 --> 01:23:43.000
Silver members in my scale of those with 50% of their content in 2 or more recognized archives.

01:23:43.000 --> 01:23:51.000
Bronze, 25% in one or more recognized archives. And unclassified those that don't even meet the bronze categorisation.

01:23:51.000 --> 01:24:04.000
So this is a scoring mechanism that works across 2 axes at the same time. It works across a percent preserved and in a number of archives that are a duplicate preservations in.

01:24:04.000 --> 01:24:12.000
So the very best in this would be to have all of your content in 3 or more of the recognized archives that I've been working with.

01:24:12.000 --> 01:24:21.000
Whereas the worst would be you have less than 25% not even in one archive.

01:24:21.000 --> 01:24:31.000
So what does it look like overall? I looked at 7 million DOIs from our sampling framework and I found some pretty disturbing results in some ways, but perhaps not unexpected.

01:24:31.000 --> 01:24:46.000
Those who've been looking at this for a while. So those in the real danger zone, the unclassified space with that less than 25% in one archive we found that approximately 33%, third of all the content.

01:24:46.000 --> 01:24:56.000
From members that I looked at was in that group. Meanwhile, bronze, which, you know, is, is okay is one archive with at least 25% in it made up.

01:24:56.000 --> 01:25:12.000
Approximately 57.7% of the members that we we examined silver again it's a much smaller percent 8.4 6 and only about 2% or so made it into that gold category.

01:25:12.000 --> 01:25:26.000
The gold star system at the top there. So there's a worrying preservation picture emerging across Crossrail members that we need to perhaps do something about.

01:25:26.000 --> 01:25:37.000
I want to know also which types of members were behaving in in what type of way. There's the variety of ways in which you can classify the, size of Crossrah members.

01:25:37.000 --> 01:25:49.000
We can do it by revenue. So this is a chart that shows the breakdowns from those who are the very lowest levels of revenue at the top through to those that have the highest level at the bottom here.

01:25:49.000 --> 01:26:05.000
And I think what you can see is the silver bar, which is perhaps the a key sign that people are doing things well with at least 2 archives backing the majority of their material increases substantially as we get down to those with more resources.

01:26:05.000 --> 01:26:19.000
Perhaps heartening though, even in though even among those with the lowest resources of our members, we do see that bronze bar where they are conducting at least one source of digital preservation for for some of their material is happening.

01:26:19.000 --> 01:26:32.000
So there is an awareness growing. Even among all types of publisher members that they should be doing digital preservation and that it's something that does matter for their membership with Crossreath.

01:26:32.000 --> 01:26:43.000
You can also do this by number of member deposits, but I think the picture again is clear. You can see this silver bar grows substantially as we get down to the bigger members who are doing more deposits.

01:26:43.000 --> 01:26:51.000
The bronze bar likewise, it stays Kind of goes down again at the end because the silver bars eating it up, which is a good sign.

01:26:51.000 --> 01:27:02.000
But if you take bronze and silver collectively as a marker of who's doing preservation, apart from the very last bar of size, we've actually got a good level of growth as we go down there.

01:27:02.000 --> 01:27:16.000
But clearly there are a number of smaller members here in this type of categorization who need us to to have a chat with them about what they're doing for digital preservation because at the moment their material is not particularly safe.

01:27:16.000 --> 01:27:19.000
If you look at this on the works front, so that was looking at members as a whole rather than looking at the number of actual works though.

01:27:19.000 --> 01:27:30.000
So we found that approximately 60% were preserved at least once. So 60% of the content in that sample had some kind of backup.

01:27:30.000 --> 01:27:44.000
System in place that could rescue it if the site went down. But that left approximately 30% in our sample that seemed unreserved and at serious risk.

01:27:44.000 --> 01:27:59.000
With an exclusion of 14% for being too recent. So they're in the current year so they weren't yet in digital preservation archives not being journalized calls or having insufficient date mess data for us to identify the source.

01:27:59.000 --> 01:28:12.000
So what are we gonna do about this? We've got a forthcoming peer review paper that sets out the the data behind this and gives you the opportunity to read in more detail about the processes that we used to come to these conclusions.

01:28:12.000 --> 01:28:24.000
We've got some lay and easy read summaries of the preservation situation coming out as well so that we can start to spread the word even among non-technical users or less technically minded publishers.

01:28:24.000 --> 01:28:40.000
We'd like to conduct some direct member outreach. I think it's really important that we in a non-confrontational way get in touch with people who are not yet doing this well and work with them because it benefits everybody to have a secure and persistent linking.

01:28:40.000 --> 01:28:51.000
And other types of work that use DOIs. We're also working on an experimental system group project opposite, which I've just started work on in the last couple of weeks.

01:28:51.000 --> 01:28:59.000
This is a project that aims to integrate DOI deposit with ingestion into digital preservation archives.

01:28:59.000 --> 01:29:01.000
And then tries to monitor an endpoint. So a DOI resolution point to see whether it's gone down.

01:29:01.000 --> 01:29:12.000
And to tell people this is a problem, you can now view it at this archive instead if you like.

01:29:12.000 --> 01:29:22.000
So if you Google for that project opposite and crossref, you should find the specification document and call for call for comment on that.

01:29:22.000 --> 01:29:28.000
If you want to feedback on that, it's not too late. Still get in touch. And full charts data set behind these findings.

01:29:28.000 --> 01:29:39.000
Are available at the website there. The hyphen vault.

01:29:39.000 --> 01:29:45.000
Thank you very much. I hope that was of interest.

01:29:45.000 --> 01:30:00.000
Excellent, thank you. So I think there's already interest in going to zoom in on the the charts and to be able to find more information on that.

01:30:00.000 --> 01:30:15.000
So, Rhiannon, do, go and check the link that Lisa has just posted in the, in the chat and Again, if you've got for up questions.

01:30:15.000 --> 01:30:18.000
That more? No.

01:30:18.000 --> 01:30:28.000
Thank you.

01:30:28.000 --> 01:30:29.000
So.

01:30:29.000 --> 01:30:33.000
I'm Martin, are you hanging around for sort of the next sort of 1520 min? So if you're mulling over a question, he'll still be, he'll still be here if you want it and answered more, more live.

01:30:33.000 --> 01:30:50.000
But yes, is it it's it's it's a really interesting analysis and there's definitely follow up options that we can see both as crossrefonds as community I expect.

01:30:50.000 --> 01:31:01.000
And there's a question just jumped into the, into the QA as well.

01:31:01.000 --> 01:31:19.000
I think the this is because of bad mess data on locations. So the question was that there are cities sometimes or regions in the location or what we've called country because it's supposed to be a country field but sometimes people put more detailed information in there.

01:31:19.000 --> 01:31:37.000
So it's the best but you know i could spend a lot of time clearing up that data and going through but with with 7 million records filtering in you just have to basically take it with a pinch and salt the location fields in there and we've done our best on location.

01:31:37.000 --> 01:31:51.000
Thank you. So final presentation struck demo for this session. Is the DI for for static sites.

01:31:51.000 --> 01:31:59.000
So I should data from our, market. Labs team is going to give us a quick demo, an overview of the Cellic site.

01:31:59.000 --> 01:32:13.000
Okay, identified generator using the Crossreft website as a test case. I appreciate a semi-live demo of stuff that is in active development, but again, I think you've put some see if you've got, you've put some safeguards in place as well.

01:32:13.000 --> 01:32:19.000
So I'll hand over to you to talk us through it.

01:32:19.000 --> 01:32:26.000
Alright, great. Thank you so much, Rachel. I'm gonna share my screen.

01:32:26.000 --> 01:32:30.000
Can everyone see my screen?

01:32:30.000 --> 01:32:32.000
Yeah, we can see the slide deck. Okay.

01:32:32.000 --> 01:32:47.000
Yeah, that's great. Yeah, so I'm going to just kind of talk about this a little bit and then do a video demo because some of the processing takes a little time and that's boring.

01:32:47.000 --> 01:32:55.000
So basically I am going to give a demo of this lovely wordy piece of software called static. Page ID generator because I couldn't think of a better name.

01:32:55.000 --> 01:33:05.000
It is developed to help create pits such as DOIs for blogs or other research materials built by static site generators such as Hugo, Jackal or others.

01:33:05.000 --> 01:33:26.000
Currently the software is being tested with Hugo. We decided to develop the software because increasingly at Crossraff we are seeing a need to develop permanent URLs for our own use on our website and could see a use for the wider community as well.

01:33:26.000 --> 01:33:33.000
Our website is built using Hugo. The software

01:33:33.000 --> 01:33:45.000
Can track a Git repository of the static site in question. And if a user specifies a path, it'll start focusing on any files under that path.

01:33:45.000 --> 01:33:51.000
It'll generate unique IDs for the files that a user wants tracked and if the user is a cross roof member it'll build the XML deposit files.

01:33:51.000 --> 01:34:02.000
Deposits the metadata for the files into Crossraff. Registers the DOIs and as the DOI back to the filing question.

01:34:02.000 --> 01:34:06.000
If you're not across staff member for now, basically all it'll do is features one and 2, which is tracks the files and generates a unique ID.

01:34:06.000 --> 01:34:21.000
But we are developing a plug-in architecture so that it can accommodate other use cases and other registration agencies and also other types of unique IDs that you'd want.

01:34:21.000 --> 01:34:33.000
So I'm going to start with the end result of this, which is that here is an example page of the test version of Crossraf.

01:34:33.000 --> 01:34:49.000
Of the Crossworth site. So I basically just created this very lovely test page and ran the script against it, which then deposited this, this metadata into Crossraff, which you can see in the API.

01:34:49.000 --> 01:35:04.000
And here's the DOI for that. And it can also then be used in downstream applications set up such as the metadata search, or if you did a search on this, you would find the same.

01:35:04.000 --> 01:35:16.000
Article and if I go back to it here's the file. So basically I'm going to show you how to show you 2 ways of how I use the script.

01:35:16.000 --> 01:35:26.000
First is in the Get Lab repository, which contains the test version of the site. And then the second is the command line version.

01:35:26.000 --> 01:35:32.000
In this repository, we run the script using Gitlab's continuous integration setup.

01:35:32.000 --> 01:35:38.000
And if you use, if you have a repository that's in GitHub, it has a similar process.

01:35:38.000 --> 01:35:46.000
So I first ran in the Get Lab repository a set of process that creates a config file which contains the information.

01:35:46.000 --> 01:35:55.000
Let me. I have it queued up in my video. So here's the config file. So here you see that I'm asking the script to track everything in the repo.

01:35:55.000 --> 01:36:05.000
I've put in a test to track everything in the repo. I've put in a test DOI prefix.

01:36:05.000 --> 01:36:21.000
I have this crazy long domain, which is basically just the test version of the sites domain. I'm specifying that I want an ID type of DOI and the JSON file basically contains all of the information of the files that are being tracked.

01:36:21.000 --> 01:36:33.000
And now I'm going to play the video. Not sure how much time we have, so I'll check in again after I've finished the get the repository part of the demo and if we have more time I can also show you the part of the demo.

01:36:33.000 --> 01:36:41.000
Cool. Yeah, I think we're okay for time.

01:36:41.000 --> 01:36:42.000
Yep.

01:36:42.000 --> 01:37:04.000
Okay, cool. The JSON file contains. Start tracking a file, I created this file which is new and basically added some front matter so it includes some of, some basically labels that is recognized by the Hugo site and also crucially I added this X version tag which basically tells the script to start tracking it.

01:37:04.000 --> 01:37:14.000
The value to the right of x version is in semantic version style. So basically the first 0 tells you that this is the major version.

01:37:14.000 --> 01:37:35.000
The next one is the minor version and this is the patch. So, the script looks for a default version of 0 point 0 point 0 and it knows that this is a new file and to begin tracking it.

01:37:35.000 --> 01:37:42.000
So once I push this file, I start to get a process that runs this job and it pulls down a darker image and runs the script essentially.

01:37:42.000 --> 01:37:56.000
So we are mentioning giving some credentials for it and then we are also saying that it needs to check the repository.

01:37:56.000 --> 01:38:05.000
Over here that the submission type is Crossraff. And that it needs some more submission information.

01:38:05.000 --> 01:38:26.000
Our deposit information for the script to deposit the metadata that you will be using eventually. So what it does is it adds the file information to the JSON file, it deposits the XML.

01:38:26.000 --> 01:38:33.000
And then it also creates a DOI and adds it back to this file. And you know that all as well because the job has succeeded.

01:38:33.000 --> 01:38:41.000
It has run successfully. We can check the page back again. And we can see that a DOI has been added to this page.

01:38:41.000 --> 01:39:00.000
And we can also see that once the site has been deployed to production, if we click on this, and go to this DI, it takes you to the site.

01:39:00.000 --> 01:39:10.000
To the page that's been deployed to production. So this is basically how you would run it.

01:39:10.000 --> 01:39:20.000
And continuous integration on a Getlab hosted repository. There is a similar operation that you would do if it was posted on GitHub as well because GitHub also has a continuous integration.

01:39:20.000 --> 01:39:29.000
Setup.

01:39:29.000 --> 01:39:39.000
We're running the script on the command line. I have cloned the static page ID script generator on to my computer.

01:39:39.000 --> 01:39:52.000
And, I have installed all the requirements in a virtual environment and in another Terminal, I have my static site repository.

01:39:52.000 --> 01:40:04.000
I first need to initialize the repository for the script, which generates 2 files that helps the script track the version files.

01:40:04.000 --> 01:40:14.000
So to run the initialized script. I run this command, which requires a few arguments that tells the script a few things.

01:40:14.000 --> 01:40:31.000
It tells the script that you to track this particular repository. To track the path in the repository which is content it gives it the production domain that it needs to generate a DOI.

01:40:31.000 --> 01:40:39.000
And a DOI prefix. So once that is generated, it will create 2 files. This is to track the files.

01:40:39.000 --> 01:40:43.000
And this is essentially contains all the values needed to generate DOIs. So I already ran this and so there is a config.

01:40:43.000 --> 01:41:01.000
YAML and a PID. JSON which looks like this for the config. YAML file and it's just an empty pit dot JSON file because I haven't run anything.

01:41:01.000 --> 01:41:11.000
So in order to let the script know that I want it to be tracked. I add this particular tag with its value in the front matter.

01:41:11.000 --> 01:41:20.000
So I tell the script that I want this to be versioned and I add an expression tag to it and I follow the semantic versioning.

01:41:20.000 --> 01:41:50.000
Style which is major, major version, minor version and patch version. So currently we have functionality that looks to see if the version number is 0 dot 0 and it's belonging to this tag and then it will start to run all that's seen run the script after telling, the script which files we want to be tagged.

01:41:51.000 --> 01:42:01.000
And we do this by again specifying the repository of path. We're telling it that the submission type is going to be crossrapped because we're going to deposit this into Crossra.

01:42:01.000 --> 01:42:12.000
And then we have another file that tells the submission scripts some more information to make the submissions successful.

01:42:12.000 --> 01:42:28.000
So it's going to generate. Some, it's going to track the files. And it has now submitted the file and now it's going to check the DOI registration.

01:42:28.000 --> 01:42:37.000
Once it runs it, it checks to DOIs and then it adds the DOI back to the file, which you can see.

01:42:37.000 --> 01:42:48.000
Over here. So now we know that this file has a DOI and can be accessed once this once the website has been deployed.

01:42:48.000 --> 01:42:57.000
Production. So that's how you would run it in the CLI. I wanted to show you what it would look like if I.

01:42:57.000 --> 01:42:59.000
Check the DOI from the command line. So this is a page that has actually been deployed.

01:42:59.000 --> 01:43:16.000
To production, which is the test. Website that I have so if I Copy and pay the paste this or just follow the link this will show me the page that has been deployed.

01:43:16.000 --> 01:43:30.000
And as you can see, the DOI is listed on the page here and this is all of my test, of course.

01:43:30.000 --> 01:43:33.000
So that's basically the demo for this. Thank you for bearing with the video and the extraneous sounds.

01:43:33.000 --> 01:43:45.000
Future steps include that it's still an active development and we still have a ways to go including incorporating it into the.

01:43:45.000 --> 01:43:57.000
Of website workflow. I'm planning on converting this into a library and allowing plug-in functionality so that it can accommodate.

01:43:57.000 --> 01:44:00.000
Other types of use cases as well. And I'm happy to hear. More feedback from all of you and you know see how we can collaborate better on this.

01:44:00.000 --> 01:44:14.000
Thank you so much.

01:44:14.000 --> 01:44:26.000
That's great. Thank you very much. It all worked very, very smoothly. And there are a couple of, hopefully, quick questions in the Q&A.

01:44:26.000 --> 01:44:30.000
If you want to call those up.

01:44:30.000 --> 01:44:31.000
Or I can just

01:44:31.000 --> 01:44:37.000
Cool. Yeah, is it the, do you have thoughts? Is that the one?

01:44:37.000 --> 01:44:46.000
Yep, the thoughts about the, on the archiving of the final rendered static page with images, for example.

01:44:46.000 --> 01:45:06.000
That's a really good question. I haven't really thought about it, Yeah, that is that is super interesting and also yeah we could also talk to Martin Eve about possibly looking into how to preserve this and so that's a that's a nice tie in into that for sure.

01:45:06.000 --> 01:45:13.000
And then there's the second question about, or actually Martin, I will let you jump. I've seen you've unmuted.

01:45:13.000 --> 01:45:16.000
Let's take a bit more time on that question.

01:45:16.000 --> 01:45:18.000
I was just gonna draw attention actually to Martin Fenner's work that was mentioned earlier today I think in a different session.

01:45:18.000 --> 01:45:34.000
That's been looking into the preservation of blogs and static sites. So there's already some experimentation around the preservation function here.

01:45:34.000 --> 01:45:45.000
But you're right, we've already some experimentation around the preservation function here, but you're right, we've got an ideal point now if we're assigning a DI to these items to think about how we might get them into an archive so that we know that the stuff is preserved at that point.

01:45:45.000 --> 01:46:02.000
Because obviously if we just open the floodgates and say everyone can have a DI for this content but then unaware that there's that preservation responsibility then we're in quite a difficult position.

01:46:02.000 --> 01:46:12.000
And then there's another question. How does this tool handle multiple resolution?

01:46:12.000 --> 01:46:13.000
Yeah, again, a very good question. Currently it's still very much an active development.

01:46:13.000 --> 01:46:29.000
So we should definitely talk more about all of those types of use cases.

01:46:29.000 --> 01:46:41.000
Great. And I will give people a few more minutes. The next session will start on the R.

01:46:41.000 --> 01:46:49.000
I'll give people a few more minutes in case they're any final questions, but I don't want to miss the opportunity to say thanks to all of our fantastic presenters and also for for your really good questions.

01:46:49.000 --> 01:47:07.000
I hope this gives a flavor of the the variety even just a small subset of the things that we are working about and thinking.

01:47:07.000 --> 01:47:23.000
Across the various teams. Bruce has just posted the link to the next session which is a discussion among crossress staff and other members of our community on what we still need to build a research nexus.

01:47:23.000 --> 01:47:30.000
So I hope we hope that you'll that you'll also join us for that.

01:47:30.000 --> 01:47:40.000
Yeah, if you want to take a few minutes. Ask any further questions or go and get a cup of tea, coffee, glass of water. And this is your chance.

01:47:40.000 --> 01:48:10.000
And thank you again.

01:48:24.000 --> 01:48:42.000
Okay, I can see people starting to. Starting to head off and take up the opportunity of a of a coffee again, there are lots of mechanisms by which you can, you know, by which you can get in touch with Crossraff.

01:48:42.000 --> 01:48:52.000
And our community forums a great, resource for that as well. So this isn't kind of the end of the opportunity to ask questions and and to find out more information.

01:48:52.000 --> 01:48:58.000
And obviously we'll we'll follow up by posting the video and a link to the the slides in the coming days once we've pulled everything together.

01:48:58.000 --> 01:49:14.000
So I'm going to leave it there. Thanks again and we hope to see you in the next session.