Jump to content

String Similarity Panel: Difference between revisions

From ICANNWiki
No edit summary
No edit summary
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
The '''String Similarity Panel''', also known as '''String Similarity Examiners''', are responsible for determining if there are any similar gTLD strings that will likely and significantly  confuse Internet users. The panel will compare [[new gTLD Program|new gTLD strings]] with any reserved name, existing TLD, requested [[IDN]] [[ccTLD]], and other new gTLD string proposals. It will also examine the IDN tables submitted by applicants. String similarity evaluations is done during the initial evaluation phase of the new gTLD application review process.<ref>[http://www.new-gtld.ch/faq.php What are the evaluation panels?]</ref>
The '''String Similarity Panel''', also known as '''String Similarity Examiners''', are responsible for determining if there are any similar gTLD strings that will likely and significantly  confuse Internet users. The panel will compare [[new gTLD Program|new gTLD strings]] with any reserved name, existing TLD, requested [[IDN]] [[ccTLD]], and other new gTLD string proposals. It will also examine the IDN tables submitted by applicants. String similarity evaluations are done during the [[Initial Evaluation|initial evaluation]] phase of the new gTLD application review process.<ref>[http://www.new-gtld.ch/faq.php What are the evaluation panels?]</ref> TLD applications deemed similar to one another will be put in contention sets, while those that are deemed too similar to existing TLDs will be eliminated from consideration without any recourse or remediation possible.


On February 25, 2009, the [[ICANN Board]] issued a call for Expressions of Interest (EOI) for individuals interested in becoming string similarity examiners.<ref>[http://archive.icann.org/en/topics/new-gtlds/eoi-string-sim-25feb09-en.pdf ICANN CALL FOR EXPRESSIONS OF INTEREST (EOIs) For New gTLD String Similarity Examiners]</ref> ICANN selected [[InterConnect Communications]] in partnership with the University College London to identify string similarity.<ref>[http://newgtlds.icann.org/en/blog/preparing-evaluators-22nov11-en Preparing Evaluators for the New gTLD Application Process]</ref>
On February 25, 2009, the [[ICANN Board]] issued a call for Expressions of Interest (EOI) for individuals interested in becoming string similarity examiners.<ref>[http://archive.icann.org/en/topics/new-gtlds/eoi-string-sim-25feb09-en.pdf ICANN CALL FOR EXPRESSIONS OF INTEREST (EOIs) For New gTLD String Similarity Examiners]</ref> ICANN selected [[InterConnect Communications]] in partnership with the University College London to identify string similarity.<ref>[http://newgtlds.icann.org/en/blog/preparing-evaluators-22nov11-en Preparing Evaluators for the New gTLD Application Process]</ref>


It remains unclear whether or not many of the 3 character new gTLD applications will face a high probability of being deemed too similar to existing ccTLDs. According to industry blog, [[Domain Incite]], 304 of 375 applications for three-letter gTLDs have only one character variance with one or more existing [[ccTLD]]. In total, if a single additional character is enough to create similarity, there are 368 potential ccTLD/gTLD conflicts in the current application round. Furthermore, the visual similarity ratio between ccTLDs and gTLDs, as measured by ICANN's [[SWORD Algorithm]] is generally only a few percentage points lower than in the case of TLDs that have already been rejected on confusing similarity grounds.<ref>[http://domainincite.com/pro/tag/string-similarity-panel/ String Similarity Panel, DomainIncite.com/pro]</ref>
For many months it was unclear whether or not many of the 3 character new gTLD applications will face a high probability of being deemed too similar to existing ccTLDs. According to industry blog, [[DomainIncite]], 304 of 375 applications for three-letter gTLDs have only one character variance with one or more existing [[ccTLD]]. In total, if a single additional character is enough to create similarity, there are 368 potential ccTLD/gTLD conflicts in the current application round. Furthermore, the visual similarity ratio between ccTLDs and gTLDs, as measured by ICANN's [[SWORD Algorithm]] is generally only a few percentage points lower than in the case of TLDs that have already been rejected on confusing similarity grounds.<ref>[http://domainincite.com/pro/tag/string-similarity-panel/ String Similarity Panel, DomainIncite.com/pro]</ref>


[[ICANN]]'s deadline for the results of the String Similarity Review was passed and rescheduled at least three times, first in July 2012, then November 2012, and then January 2013. It was finally scheduled for release on March 1st, which caused concern about this date given that formal objections are currently due March 13th.<ref>[https://twitter.com/jintlaw/status/289827051924496384 Status, Jintlaw Twitter.com]Published and Retrieved 11 Jan 2013</ref><ref>[https://twitter.com/gTLDNews/status/289833996639158273 Status, gTLDNews, Twitter.com]Published and Retrieved 11 Jan 2013</ref>
'''The results were released two days early and surprised many with their lack of findings given the multiple delays and many months needed to create the list - other than the exact matches, which were clearly already understood to be in contention - the Panel found only 4 strings in contention: [[.hotels]] & [[.hoteis]], and [[.unicom]] & [[.unicorn]].<ref>[http://domainincite.com/11997-after-eight-months-similarity-review-creates-only-two-new-gtld-contention-sets After Eight Months Similarity Review Creates only Two Contention Sets, DomainIncite.com] Retrieved 27 Feb 2013</ref> One commentator, [[Kieren McCarthy]], noted that he thought the decision had new CEO Fadi Chehadé's "fingerprints all over it," given Mr. Chehadé's focus on improving ICANN's internal processes and refocus on its commitments to its stakeholders.<ref>[http://news.dot-nxt.com/2013/02/26/icann-publishes-new-gtld-conte ICANN publishes New gLTD contention sets, News.Dot-Nxt.com] Published and Retrieved 27 Feb 2013</ref>
==Sword Algorithm==
==Sword Algorithm==
The '''Sword Algorithm''' is the string similarity assessment tool adopted by ICANN to automatically determine if a new gTLD being applied for is not confusingly similar to a reserved name or existing TLD. SWORD, an international IT company expert in verbal search algorithms, developed the tool to automate the process of examining the similarities of proposed and existing TLD strings. The tool is intended to provide an open, objective and predictable mechanism to determine the level of visual likeness between gTLDs.<ref>[https://icann.sword-group.com/algorithm/ String Similarity Assessment Tool]</ref>
The '''Sword Algorithm''' is the string similarity assessment tool adopted by ICANN to automatically determine if a new gTLD being applied for is not confusingly similar to a reserved name or existing TLD. SWORD, an international IT company expert in verbal search algorithms, developed the tool to automate the process of examining the similarities of proposed and existing TLD strings. The tool is intended to provide an open, objective and predictable mechanism to determine the level of visual likeness between gTLDs.<ref>[https://icann.sword-group.com/algorithm/ String Similarity Assessment Tool]</ref>
Line 12: Line 15:
The algorithm uses a proprietary software that mathematically calculates the visual similarity of string based on the length of the strings, number of similar letters within sequences of two or more letters, number of similar letters not in sequence, number of dissimilar letters, and length of common prefixes and suffixes if greater than one. The algorithm also uses an image recognition program that supports most common characters in other languages including Arabic, Chinese, Cyrillic, Devanagari, Greek, Japanese, Korean and Latin. It is capable in comparing cross-script strings under the same group pf scripts.<ref>[https://icann.sword-group.com/algorithm/ About This Tool]</ref>
The algorithm uses a proprietary software that mathematically calculates the visual similarity of string based on the length of the strings, number of similar letters within sequences of two or more letters, number of similar letters not in sequence, number of dissimilar letters, and length of common prefixes and suffixes if greater than one. The algorithm also uses an image recognition program that supports most common characters in other languages including Arabic, Chinese, Cyrillic, Devanagari, Greek, Japanese, Korean and Latin. It is capable in comparing cross-script strings under the same group pf scripts.<ref>[https://icann.sword-group.com/algorithm/ About This Tool]</ref>


You may utilize the Sword Algorithm [https://icann.sword-group.com/algorithm/ '''here'''].
'''You may utilize the Sword Algorithm [https://icann.sword-group.com/algorithm/ here].'''
 
==Criticism==
Criticism of the String SImilarity Panel includes that it has no review or appeal process; it is not clear whether the intended registration policies will affect delegation, so closed TLDs may be deeemd similar when there will likely be little room for practical overlap and so confusion; it does not build upon the flawed process first undertaken in reviewing [[IDN]] [[ccTLD]]s; and its total lack of transparency and community input.<ref>[http://www.circleid.com/posts/20130121_a_serious_bug_in_the_similarity_check/ A Serious Bug in The Similarity Check, CircleID.com] Published Jan 21, Retrieved Jan 22</ref>
 
===Outside Analysis===
A November 2012 letter to ICANN, sent by CEO [[Jeffrey Smith]] of [[Commercial Connect]], applicant for [[.shop]], asks ICANN to clarify its String Similarity rules and provides its own analysis. He concludes that there are only 56 of the 966 generic TLDs applied for could be considered distinct and unique, "We reviewed the strings for the 966 applicants and grouped by their meanings. For the purpose of the analysis, we treated the IDN the same as other applications. Of the 966 applications, only 56 appeared to be unique. In other words, there were only 56 words or “meanings” that were applied for[..] For instance, .auto and .car have the same or similar meaning [..] in a much broader scope, .shop, .store, .buy, etc. would confuse the end user as to which TLD would be appropriate for eCommerce."<ref>[http://www.thedomains.com/2013/01/30/shop-applicant-tells-icann-on-string-similarity-there-are-only-56-unique-generic-strings/ Shop Applicant Tells ICANN On String Similarity There are Only 56 Unique Generic Strings, TheDomains.com] Published and Retrieved 30 Jan 2012</ref>
 
It remains unclear if the meaning/intended audience of the domain will affect string similarity.


==Related Panels==
==Related Panels==
Other Panels involved in the Initial Evaluation Process:
Other Panels and evaluations involved in the Initial Evaluation Process:<ref>[http://www.gtld.com/news/2012/closer-look-new-gtld-evaluation-program Closer Look New gTLD Evaluation Program, gTLD.com]</ref>
* [[Financial Evaluation Panel]]
* [[Financial Evaluation Panel]]
* [[Technical Evaluation Panel]]
* [[Technical Evaluation Panel]]
* [[Geographic Names Panel]]
* [[Geographic Names Panel]]
* [[DNS Stability Panel]]
* [[Comparative Evaluation Panel]]
* [[Comparative Evaluation Panel]]


Line 25: Line 37:


[[Category:Glossary]]
[[Category:Glossary]]
__NOTOC__

Latest revision as of 15:50, 27 February 2013

The String Similarity Panel, also known as String Similarity Examiners, are responsible for determining if there are any similar gTLD strings that will likely and significantly confuse Internet users. The panel will compare new gTLD strings with any reserved name, existing TLD, requested IDN ccTLD, and other new gTLD string proposals. It will also examine the IDN tables submitted by applicants. String similarity evaluations are done during the initial evaluation phase of the new gTLD application review process.[1] TLD applications deemed similar to one another will be put in contention sets, while those that are deemed too similar to existing TLDs will be eliminated from consideration without any recourse or remediation possible.

On February 25, 2009, the ICANN Board issued a call for Expressions of Interest (EOI) for individuals interested in becoming string similarity examiners.[2] ICANN selected InterConnect Communications in partnership with the University College London to identify string similarity.[3]

For many months it was unclear whether or not many of the 3 character new gTLD applications will face a high probability of being deemed too similar to existing ccTLDs. According to industry blog, DomainIncite, 304 of 375 applications for three-letter gTLDs have only one character variance with one or more existing ccTLD. In total, if a single additional character is enough to create similarity, there are 368 potential ccTLD/gTLD conflicts in the current application round. Furthermore, the visual similarity ratio between ccTLDs and gTLDs, as measured by ICANN's SWORD Algorithm is generally only a few percentage points lower than in the case of TLDs that have already been rejected on confusing similarity grounds.[4]

ICANN's deadline for the results of the String Similarity Review was passed and rescheduled at least three times, first in July 2012, then November 2012, and then January 2013. It was finally scheduled for release on March 1st, which caused concern about this date given that formal objections are currently due March 13th.[5][6]

The results were released two days early and surprised many with their lack of findings given the multiple delays and many months needed to create the list - other than the exact matches, which were clearly already understood to be in contention - the Panel found only 4 strings in contention: .hotels & .hoteis, and .unicom & .unicorn.[7] One commentator, Kieren McCarthy, noted that he thought the decision had new CEO Fadi Chehadé's "fingerprints all over it," given Mr. Chehadé's focus on improving ICANN's internal processes and refocus on its commitments to its stakeholders.[8]

Sword Algorithm[edit | edit source]

The Sword Algorithm is the string similarity assessment tool adopted by ICANN to automatically determine if a new gTLD being applied for is not confusingly similar to a reserved name or existing TLD. SWORD, an international IT company expert in verbal search algorithms, developed the tool to automate the process of examining the similarities of proposed and existing TLD strings. The tool is intended to provide an open, objective and predictable mechanism to determine the level of visual likeness between gTLDs.[9]

The String Similarity Panel is responsible in validating the results of the sword algorithm and determining whether the two or more strings really have high a level of visual similarity that will confuse users. The panel ultimately decides if the strings should be put in a contention set or direct contention.[10]

The algorithm uses a proprietary software that mathematically calculates the visual similarity of string based on the length of the strings, number of similar letters within sequences of two or more letters, number of similar letters not in sequence, number of dissimilar letters, and length of common prefixes and suffixes if greater than one. The algorithm also uses an image recognition program that supports most common characters in other languages including Arabic, Chinese, Cyrillic, Devanagari, Greek, Japanese, Korean and Latin. It is capable in comparing cross-script strings under the same group pf scripts.[11]

You may utilize the Sword Algorithm here.

Criticism[edit | edit source]

Criticism of the String SImilarity Panel includes that it has no review or appeal process; it is not clear whether the intended registration policies will affect delegation, so closed TLDs may be deeemd similar when there will likely be little room for practical overlap and so confusion; it does not build upon the flawed process first undertaken in reviewing IDN ccTLDs; and its total lack of transparency and community input.[12]

Outside Analysis[edit | edit source]

A November 2012 letter to ICANN, sent by CEO Jeffrey Smith of Commercial Connect, applicant for .shop, asks ICANN to clarify its String Similarity rules and provides its own analysis. He concludes that there are only 56 of the 966 generic TLDs applied for could be considered distinct and unique, "We reviewed the strings for the 966 applicants and grouped by their meanings. For the purpose of the analysis, we treated the IDN the same as other applications. Of the 966 applications, only 56 appeared to be unique. In other words, there were only 56 words or “meanings” that were applied for[..] For instance, .auto and .car have the same or similar meaning [..] in a much broader scope, .shop, .store, .buy, etc. would confuse the end user as to which TLD would be appropriate for eCommerce."[13]

It remains unclear if the meaning/intended audience of the domain will affect string similarity.

Related Panels[edit | edit source]

Other Panels and evaluations involved in the Initial Evaluation Process:[14]

References[edit | edit source]