As you may have noticed, we have recently introduced a very simple, but robust markdown editor.
Before that, we had kept the posts in a json format, with custom parsing of the different post parts like paragraphs, images etc. This turned out to be a bad engineering decision, because we needed to maintain the parsing logic ourselves and it took more resources than expected causing for many of you, well, some frustration. Also for me :)
Now, with the underlying structure of the posts to be markdown, there are some dependent functionalities to be adjusted, too. One, for instance, is the extraction of the first image in the post for SEO purposes. When you share your post on Twitter, you want the image to be shown to your audience, something like this:
The task is to take basically something in the format of:
and have this as result:
Easy task, right? Well, I've spent the last 2-3 hours just trying to figure out how to extract the very first image url from a markdown text.
The algorithm is the following:
- parse out all
!()from the markdown
undefinedif none found, otherwise map them the url within the rounded brackets
- find the first occurrence in the urls with the allowed image extension and return it
I published the code snippet as a git gist here. Do you have an idea how can I optimize this code further? The performance of Honest is at stake!
Follow the Honest dev progress here.