Bir bileşen için ödeme yapmadan HTML'de HTML'yi RTF'ye (Zengin Metin) nasıl dönüştürebilirim?

RTF HTML dönüştürmek ücretsiz bir üçüncü taraf veya .NET sınıf var mı (zengin metin kullanılmak üzere, Windows Forms denetimi etkin)?Bir bileşen için ödeme yapmadan HTML'de HTML'yi RTF'ye (Zengin Metin) nasıl dönüştürebilirim?

"özgür" gereklilik sadece bir prototip üzerinde çalışıyorum ve sadece BrowserControl yüklemek ve eğer gerekirse sadece (yavaş olsa bile) HTML görüntüleyebilir ve bu Geliştirici Ekspres olacak olmasından kaynaklanır kendi kontrollerini yakında bırakıyor. İstemediğim

elle RTF yazmak için öğrenmek, ve zaten HTML bilmek, bu yüzden bu hızla kapıdan bazı kanıtlanabilir kodu almak için hızlı yoludur anlamaya.

kaynak

2008-09-29 Josh Kodroff

Aslında basit ve ücretsiz çözüm yoktur:

var webBrowser = new WebBrowser(); 
webBrowser.CreateControl(); // only if needed 
webBrowser.DocumentText = *yourhtmlstring*; 
while (_webBrowser.DocumentText != *yourhtmlstring*) 
    Application.DoEvents(); 
webBrowser.Document.ExecCommand("SelectAll", false, null); 
webBrowser.Document.ExecCommand("Copy", false, null); 
*yourRichTextControl*.Paste();

Bu diğer yöntemlere göre daha yavaş olabilir ama en azından özgür ve çalışır: tarayıcınızı ok bu kullandığım hile! SPARTACO cevabı üzerine genişletilmesi

kaynak

2011-01-31 18:34:04 Spartaco

Bu harika bir çözümdür. Gecikmeli bir ek yük olacak, ama büyük belgeler için oldukça hızlı olacak ve kalite iyi olacak. –

Güzel bir geçici çözüm, ancak resimler doğru kopyalanmadı. – Amr

Bu sadece ihtiyacım olan şey, teşekkürler! –

Belki ihtiyacın olan şey a control to edit the HTML mı?

kaynak

2008-09-30 08:10:05 GvS

Elbette mükemmel değil, ama burada düz metin, HTML dönüştürmek için kullanın koddur.

public static string ConvertHtmlToText(string source) { 

      string result; 

      // Remove HTML Development formatting 
      // Replace line breaks with space 
      // because browsers inserts space 
      result = source.Replace("\r", " "); 
      // Replace line breaks with space 
      // because browsers inserts space 
      result = result.Replace("\n", " "); 
      // Remove step-formatting 
      result = result.Replace("\t", string.Empty); 
      // Remove repeating speces becuase browsers ignore them 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
                    @"()+", " "); 

      // Remove the header (prepare first by clearing attributes) 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*head([^>])*>", "<head>", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"(<()*(/)()*head()*>)", "</head>", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(<head>).*(</head>)", string.Empty, 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      // remove all scripts (prepare first by clearing attributes) 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*script([^>])*>", "<script>", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"(<()*(/)()*script()*>)", "</script>", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      //result = System.Text.RegularExpressions.Regex.Replace(result, 
      //   @"(<script>)([^(<script>\.</script>)])*(</script>)", 
      //   string.Empty, 
      //   System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"(<script>).*(</script>)", string.Empty, 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      // remove all styles (prepare first by clearing attributes) 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*style([^>])*>", "<style>", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"(<()*(/)()*style()*>)", "</style>", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(<style>).*(</style>)", string.Empty, 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      // insert tabs in spaces of <td> tags 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*td([^>])*>", "\t", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      // insert line breaks in places of <BR> and <LI> tags 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*br()*>", "\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*li()*>", "\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      // insert line paragraphs (double line breaks) in place 
      // if <P>, <DIV> and <TR> tags 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*div([^>])*>", "\r\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*tr([^>])*>", "\r\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<()*p([^>])*>", "\r\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      // Remove remaining tags like <a>, links, images, 
      // comments etc - anything thats enclosed inside < > 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<[^>]*>", string.Empty, 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      // replace special characters: 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&nbsp;", " ", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 

      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&bull;", " * ", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&lsaquo;", "<", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&rsaquo;", ">", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&trade;", "(tm)", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&frasl;", "/", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"<", "<", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @">", ">", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&copy;", "(c)", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&reg;", "(r)", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      // Remove all others. More can be added, see 
      // http://hotwired.lycos.com/webmonkey/reference/special_characters/ 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        @"&(.{2,6});", string.Empty, 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 


      // make line breaking consistent 
      result = result.Replace("\n", "\r"); 

      // Remove extra line breaks and tabs: 
      // replace over 2 breaks with 2 and over 4 tabs with 4. 
      // Prepare first to remove any whitespaces inbetween 
      // the escaped characters and remove redundant tabs inbetween linebreaks 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(\r)()+(\r)", "\r\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(\t)()+(\t)", "\t\t", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(\t)()+(\r)", "\t\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(\r)()+(\t)", "\r\t", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      // Remove redundant tabs 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(\r)(\t)+(\r)", "\r\r", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      // Remove multible tabs followind a linebreak with just one tab 
      result = System.Text.RegularExpressions.Regex.Replace(result, 
        "(\r)(\t)+", "\r\t", 
        System.Text.RegularExpressions.RegexOptions.IgnoreCase); 
      // Initial replacement target string for linebreaks 
      string breaks = "\r\r\r"; 
      // Initial replacement target string for tabs 
      string tabs = "\t\t\t\t\t"; 
      for (int index = 0; index < result.Length; index++) { 
       result = result.Replace(breaks, "\r\r"); 
       result = result.Replace(tabs, "\t\t\t\t"); 
       breaks = breaks + "\r"; 
       tabs = tabs + "\t"; 
      } 

      // Thats it. 
      return result; 

    }

kaynak

2008-09-30 21:11:35 Andrew

downvoted çok etkili buraya açıkladı: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 –

İronik XSLT hata eğilimli olabilir neredeyse aynı nedenlerden dolayıdır. HTML dağınık. Ve nadiren dönüşüm için hazır uygun bir XML belgesidir. Uygun bir çözümün, uygun bir XSLT dönüşümü için dokümanı yeterince temizlemesini sağlamak için biraz regex içereceğinden şüpheleniyorum. XHTML için büyük – Menefee

Geliş XHTML2RTF bu CodeProject makale (Ben orijinal yazar değildi, web üzerinde bulunan koddan adapte).

kaynak

2009-04-16 03:03:40

, ama bir isim okumasını tahmin ediyorum gibi olmayan XHTML/"vanilya HTML" için de çalışmaz ... – sager89

Korku veren! Bunun dışında bir Konsol Uygulaması yaptı. Konsollar ana yönteminin önüne [STAThread] eklemek gerekiyordu. – dforce

Ben BÜYÜK işleri aşağıdaki implimented! nedenlerle

Using reportWebBrowser As New WebBrowser 
     reportWebBrowser.CreateControl() 
     reportWebBrowser.DocumentText = sbHTMLDoc.ToString 
     While reportWebBrowser.DocumentText <> sbHTMLDoc.ToString 
      Application.DoEvents() 
     End While 
     reportWebBrowser.Document.ExecCommand("SelectAll", False, Nothing) 
     reportWebBrowser.Document.ExecCommand("Copy", False, Nothing) 

     Using reportRichTextBox As New RichTextBox 
      reportRichTextBox.Paste() 
      reportRichTextBox.SaveFile(DocumentFileName) 
     End Using 
    End Using

kaynak

2011-02-17 21:01:21 cjbarth

Her zaman oluşturduğunuz denetimlerde 'Dispose()' öğesini çağırmıyorsanız, bellek ayırma sorunlarına dikkat edin. – Seph

Teşekkürler @Seph. Bunu hesaba katmak için kodu değiştirdim. – cjbarth

Bir bileşen için ödeme yapmadan HTML'de HTML'yi RTF'ye (Zengin Metin) nasıl dönüştürebilirim?

cevap

İlgili konular