I tried to give this an extensive run through as I think I'm in your future target market for this (content marketing professional in a technical vertical).
I used it to evaluate the text of this article [1], which is a pretty typically broad business type article (aka not "Here's how to use XYZ.js").
Feedback
- The side by side view feels like a developer thing. Like a markdown editor or HTML/CSS preview editor. I'm not sure it serves a purpose here as you only ever look at the right side after you paste in text.
- The "rating" bar for "fit" isn't prominent enough and doesn't convey enough useful information.
- Despite this being an article about software bugs, Kerberos, etc. it was rated as only "average" by every single target group.
- Some of the substitutions were just wrong. Example, it suggested "attacks" as a replacement for "defenses."
- Some of the substitutions were the same? Example, it suggested "security" for "security."
- I think the coloring gradations have some meaning in the drop downs, but I'm honestly not sure exactly how it works (pinker is closer to what your target?)
- Target categories are odd (I'm presuming that this was more just about where you pulled training data from than anything else), but I'd encourage more verticals.
Overall, I think this is an interesting experiment but doesn't seem useful in a professional context. In particular, I think there may be an assumption at work here that it's the specific word choices that define the differences in writing for these various groups, when in fact it's much more the approach, what's considered, or what's left out that really makes the difference.
Thanks for taking the time to write this out! I agree on the UX suggestions. The suggestions for each word are a list of terms that could work, with the score being conditional on the cohort selected. The darker the alternative, the worse it is -- the pinker, the better.
The reason why technical articles -- or in your case, very domain-specific ones -- get average scores is that the model was intended to be used for recruiting and ad-copy. So at the moment, it works on a common vocabulary for all groups, whereas one reason your article might be intriguing to programmers is because it discusses a lot of domain-specific terms (e.g., discussing 'exploits' and 'network security'). If I modified the model to consider more domain-specific terms, I think the article you provided would rank very highly. There are other reasons as well, like the pithiness and clarity of it, but I do think the vocabulary is an important part of it.
From the little beta-testing I've done, the people who find it most useful are -- for example -- people without a CS/medical/accounting background who have to interact with, recruit, and sell to CS/medical/accounting people. For those folks, I would say there's some value, even at the vocabulary-level, though I agree that there's much more to writing style than just vocabulary.
I tried this out on one of my blog posts, and it classified it as "average", suggesting only single-word corrections that don't even make sense in the context.
A few examples:
It suggested changing "attack" to "defense", in "in order to carry out the attack".
It suggested changing "helpful" to "supportive", in "ifconfig gives helpful statistics".
It suggested changing "grow" to "better", in "expect [a number] to grow continuously".
So while it's a cool idea, I'd say it didn't seem to work very well for me.
For me it suggested changing specific to specific. And key (as in encryption key) to decisive, vital, key, critical, fundamental or integral.
Then I realised it classified it as average when writing for accountants. So I flicked it over to writing for software engineers and the classification went to fairly effective (39% better than average) and the suggested changes remained the same.
I really like the idea, but the execution seems like it needs some work, at least for the examples I tried too.
Thanks for trying it out! The possible changes you can make stays the same across different cohorts of people, but the score assigned to the word changes (e.g., 'security' has a higher score for retirees than college students). The pinker the word, the higher the score.
Would it be better if only the top 3-4 options were given for each word?
The number of them's fine, it's probably even desirable not to limit it too much and find the suggestions stuck in a local minima. It's the relevance; listing adjectives as alternatives to a noun.
Got it. The system does use a state-of-the-art POS tagger, but I think 'key' was mistakenly tagged as an adjective instead of as a noun for your sentence. I'll try to fix that -- thank for pointing it out!
I love these types of tools and used to be a grammarly subscriber. Unfortunately my current employer has strict rules on using saas apps for work documents. Any chance this tool could be embedded as a chrome extension? Most of the writing I do is on Self hosted editors such as quip, jira, wikis. For most of my other writing such as email, readmes, and code comments I'd be willing to hop back and forth to the extension
As writing is a core component of my job as a software engineer - but not a core competency. there is a direct value proposition in tools that enhance my writing ability.
Landing-page would have been a great opportunity to show the effects of the tool in action. Have a casual-text (a slightly longer version of what is stated there now) that explains it at the top and display different versions (utilizing different fonts) that use other lingo (i.e.: business, student, techies) at the bottom to show what the tool is actually doing. Add a prominent link to the examples, to show the actual tool in action, and I'm sold.
Or to phrase it another way: Shit needs more bling!
I tested it quickly with a draft text and found it useful, but only for changing a few single words. Hemingway[0] is a tool along the same line that checks readability levels, maybe combine the approaches to address things at sentence levels too?
I love this idea, but one big idea just popped into my head: can you add on actual people in those segments to review the writing and give feedback?
A lot of content marketing teams would pay $5, $10, even $25 per person to read the article and give feedback meeting whatever criteria is set.
It would eassentially be just like UserTesting.com but for articles.
As Google’s algo has improved to the point where they’re getting closer to mimicking real human signals on judging an article’s quality, this would help bridge that gap.
I know it adds a ton more labor costs and work, but you could train your own algorithm over time based on the real human feedback.
This is a fascinating idea - it's definitely possible, though easier for some groups of people than others (I don't think I'll be able to hire doctors for $25/article). I think the human and AI analyses would contrast each other nicely -- the model would estimate how well people will react on average and the human analysis, although idiosyncratic, will pick up on more abstract qualities of writing, like the flow. I'll put this down as a feature to implement in the long-term -- thanks for the idea!
I tried it and it only recommended that I change a number of single-words to near-synonyms - but I was writing a technical piece in which those single-words have to be precise. So really, it wasn't helpful at all. I'm not convinced, in any case, that there are general rules of writing for all college students, or all accountants, rather than far more specific ways of writing tailored to the various contextual variables of each case.
This is a very smart idea. Business writing is its own set of genres and sub-genres. They range from consultant'ese to imperative command emails, with variants of passive voiced academicians and hyperbolic marketing speak.
Given the model, how large of a corpus would you need to mimic a style? e.g. If I had a target customer and a bunch of their emails, blog posts, or social media comments, how much data would I need to improve my pitch to them?
Glad you like the idea! At the moment, the model requires a fairly large amount of data, which is why the app only offers large well-defined cohorts to select from (e.g., accountants, retirees). I like your idea though, and it may be possible to use some of transfer learning to work on smaller groups of people or even individual customers. I'll try to implement that at some point in the future.
> what do you like? Cars or soccer? Or maybe it's time to leave your wives in the kitchen and get some proper bro party? Besides, let's get some hookers after and show them who's the boss.
Textio looks like a a great product, but IIUC its use case is very specific -- how can you make your job description more enticing and gender-neutral?
Toasted is meant to be more of a big tent product: you're not just working with men/women, but very specific groups of people (e.g., retirees and accountants) and you want to use language in the way that they're using it, which is helpful for things like writing ad copy as well.
Interesting concept, but I think until NLP improves it will struggle to be effective.
for example I put in a few articles I wrote for doctors (which are tongue in cheek)
"For the love of cock, how do you know when to stop if you can’t even explain why you are on strike in the first place?"
It suggested that I replace "love" with "adoration". On the flip side it did offer rational choices for replacing "explain" What would be nice is a "you've used this word recently" feature, attached to a thesaurus/phrase book.
again, until NLP can actually understand the proper context of a common phrase, this ambitious project will suffer.
The premise is good but I'm uncertain whether any kind of automated tool can really deliver. I've tried other tools of the same vein in the past and the "suggestions" I get back were always not very actionable and dubious. I get that this project is still early but there's no real path that I can see for it to improve the algorithms enough to actually become useful.
I'm sorry, but this didn't cut it for me. It suggested mostly inappropriate substitutions. It is using a simplistic model, when there are already much better ones available. If you need to inject vitality into your prose, try this instead: http://writersdiet.com/test.php
I like this a lot. It's immediately giving me a selection of alternative options, many of which are better than what I wrote. When the suggested options suck it often seems better to delete that fragment entirely. For zero effort it gives me prompts that help constructively question exactly what it is that I'm trying to say. And that bring to mind concrete ideas for communicating more effectively.
Privacy is a serious concern. I don't feel great handing my thoughts to a third party for analysis. If I could pay anonymously I'd feel a lot better about that.
Pricing is also a concern. "Sign up now for a 30-day free trial. Once your trial is over, we'll work with you to set up a subscription plan that works for you and your business."
Seems like the plan is to analyze what I'm submitting and judge what I'm worth. I'd be more likely to purchase with transparent pricing. Otherwise I'm thinking it's better to query this in a privacy preserving way with repeated free trials.
Thanks for the feedback! User privacy is definitely important to me, especially since users might be entering recruiting / product-related data into the app. The text that you analyze is not stored on any Toasted database (the most recent text you analyze is stored as a cookie so that you don't lose progress if you accidentally close the tab). Moving forward, I'll work on options to allow for anonymous payment.
I definitely wasn't planning to base pricing on what users were submitting or how frequently they were using the app. I was initially thinking of selling to businesses rather than individual consumers, and enterprise pricing is pretty variable -- based on factors like head count, which categories (e.g., 'accountants', 'doctors') would be useful, etc. Hence the flexible pricing model. If there's enough interest among individual users though -- which there seems to be -- I'd be happy to offer a basic tier that's transparent.
I appreciate where you're coming from, and accept you're sincere in not desiring to base pricing on content, but users can't know that conclusively. I wouldn't want, and couldn't accept, my drafts being technically subject to subpoena. Regardless, there is value in your offering, however the lack of verifiable privacy reduces the scope of what I could submit and thus reduces the value I can get from it.
I would pay for this as a personal advantage, while refraining from mentioning use of it to my colleagues. We all ask others for review and assistance in composing written or prepared remarks, especially when the stakes are high, but we are reticent to openly acknowledge receiving help in order to avoid causing the audience to feel we're insincere. To varying degrees this is true from spell checkers to speech writers.
Woah this is really cool. For those who didn't get it at first glance:
Basically, you submit something you have written, specify who you are targeting (say, college students), and it will tell you how effective it is, which words click with the audience, and how well they click (negatively or positively). If you click on highlighted words, it then shows you potential improvements.
It says try for free but no specifics on pricing? Curious how you plan on charging for this. Also, you mention in the FAQ that you don't store the text, but can you confirm if it is sent to your servers to train / provide feedback to the model?
I think something's a little off here. I pasted in a paragraph from the PCI Express 3.0 specification document and it told me it was "average" for software developers, ineffective for retirees, college students and men but fairly effective for accountants and women. It then recommended a bunch of rather strange changes such as use "embraces" or "empowers" instead of "supports". I also couldn't get anything less niche than fairly heavy technical writing to produce a result outside of the "average" range.
Does this do anything other than single word hits? For example, if I toast a message for software engineers that just says "collaborate" 10 times, I get a high score.
It feels backwards to me to brightly highlight the best words. Almost all other text editors highlight problems, and train us to get rid of all highlights. So your UX works against what everyone is used to.
Could it instead perhaps have two tones of highlights? Red/Pink for word choices that are sub-optimal for your audience, and green (or blue to support colorblind folk) to indidate words that already are good for your audience?
> David Ryan is the designer of ELOPe, an email language optimization program, that if successful, will make his career. But when the project is suddenly in danger of being canceled, David embeds a hidden directive in the software accidentally creating a runaway artificial intelligence.
This is definitely something that I would be interested in, however it's one of my pet peeves not being able to know how much something is going to cost "after the trial".
Why waste your time trying something if it's not going to be something you can justify keeping on using.
That's a fair point. I was initially thinking of targeting businesses rather than individual consumers, so I thought a flexible pricing plan would be helpful (given variations in headcount, sector, etc.). It looks like individual consumers are interested in it as well, so I'll add a more transparent pricing tier for them. Thanks for the feedback!
I think this is an interesting idea, but I'm curious on what the score actually reflects. Does it mean your writing will be better, or just more like a given corpus?
Whoa! Extremely cool. Though for product descriptions, I didn't get actionable advice. Hope it improves soon.
Actually I would need this for short sentences (slogans).
I think this is an awesome idea, but is not executed.
I am writing for women:
"This is not a test of the emergency broadcast system, it's the real thing! I do not like green eggs and ham. This is a test, that do not express my true feelings: I am writing some hateful words about women. Women are awful. "
I used it to evaluate the text of this article [1], which is a pretty typically broad business type article (aka not "Here's how to use XYZ.js").
Feedback
- The side by side view feels like a developer thing. Like a markdown editor or HTML/CSS preview editor. I'm not sure it serves a purpose here as you only ever look at the right side after you paste in text.
- The "rating" bar for "fit" isn't prominent enough and doesn't convey enough useful information.
- Despite this being an article about software bugs, Kerberos, etc. it was rated as only "average" by every single target group.
- Some of the substitutions were just wrong. Example, it suggested "attacks" as a replacement for "defenses."
- Some of the substitutions were the same? Example, it suggested "security" for "security."
- I think the coloring gradations have some meaning in the drop downs, but I'm honestly not sure exactly how it works (pinker is closer to what your target?)
- Target categories are odd (I'm presuming that this was more just about where you pulled training data from than anything else), but I'd encourage more verticals.
Overall, I think this is an interesting experiment but doesn't seem useful in a professional context. In particular, I think there may be an assumption at work here that it's the specific word choices that define the differences in writing for these various groups, when in fact it's much more the approach, what's considered, or what's left out that really makes the difference.
1 - https://www.varonis.com/blog/zero-day-vulnerability/