aboutsummaryrefslogtreecommitdiff
path: root/blog/youtube_to_rss.md
blob: c96e31f2ad3741aaff09c09637a12cb3e9c26566 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Convert your Youtube subsciptions to RSS feeds

> Fun fact: I'm such a zoomer that I only discovered RSS recently (which is [years] old).

## Get all subscription

You probably could do some fancy stuff with the google API but it would probably be overkill.
Go to <https://youtube.com/feed/channels>, scroll to the bottom (Good luck if you got 1000+, your middle finger wight get cramped).  
Right click on something that is not a link or an image and select `Save as`, give a name to the file and save it.

## Parse list of subscription

Get all the channels urls, replace `channels.html` with the HTML file you saved.

```
grep -o -E 'href="https://www.youtube.com/(c|channel|user)/[a-zA-Z0-9 ]+"' channels.html |
    sort |
    uniq |
    sed 's/href="\(.*\)"/\1/' > channel_urls
```

Some channels don't aren't prefixed with `/c/` `/channel/` or `/user/` in the url
so you'll either have to add them manually or change the `grep` regex to accept all url
which begin with `https://www.youtube.com/` and remove the links which aren't youtube channels.

> Those channels are pretty rare tho, on my 300+ subscriptions I only had 2 or 3.

## Choose the channels to add to your feeds

Now comes the tedious and cringing part where you need to through aaaall your old and obscure subscription and filter the bad ones out.  
> Protip: If you want to automate this part you can ask Google to do it for you since they know you better than yourself by now.


## Get channel info

I guess most RSS reader understand the HTML /balise/ `<link rel="alternate" type="application/rss" .../>` but [newsboat]() (which I use) doesn't unfortunatly.  
We can get the url to a channel feed with a simple `curl` into `grep`.

```
xargs -a channel_urls curl -s |
    stdbuf -oL grep -o -E \
        -e '<title>.* - YouTube</title>' \
        -e 'https://www\.youtube\.com/feeds/videos\.xml\?channel_id=[a-zA-Z0-9_-]+' |
    awk '!seen[$0]++' |
    sed 's:<title>\(.*\) - YouTube</title>:\1:' |
    sed 'N; s/\n/ /' |
    sed 's/\(.*\)  # \(https:.*\)/\2  # \1/' |
    tee /dev/stderr 2> urls |
    cat -n
```

## Sources

* [luke smith]()