Have a personal or library account? Click to login
Duplicate Literature Detection for Cross-Library Search Cover

Duplicate Literature Detection for Cross-Library Search

By: Wei Liu and  Jianxun Zeng  
Open Access
|Jun 2016

References

  1. 1. http://www.brightplanet.com/completeplanet/
  2. 2. Su, W., H. Wu, Y. Li et al. Understanding Query Interfaces by Statistical Parsing. - ACM Transactions on the Web (TWEB), Vol. 7, 2013, No 2, p. 8.10.1145/2460383.2460387
  3. 3. Dragut, E. C., W. Meng, C. T. Yu. Deep Web Query Interface Understanding and Integration. - Synthesis Lectures on Data Management, Vol. 7, 2012, No 1, pp. 1-168.10.2200/S00419ED1V01Y201205DTM026
  4. 4. Lu, Y, H. He, H. Zhao et al. Annotating Search Results from Web Databases. - Knowledge and Data Engineering, IEEE Transactions on, Vol. 25, 2013, No 3, pp. 514-527.10.1109/TKDE.2011.175
  5. 5. Palekar, V. R., M. S. Ali, R. Meghe. Deep Web Data Extraction Using Web Programming-Language Independent Approach. - Journal of Data Mining and Knowledge Discovery, Vol. 3, 2012, No 2, p. 69.
  6. 6. Wang, Z., G. Xu, H. Li et al. A Probabilistic Approach to String Transformation. - Knowledge and Data Engineering, IEEE Transactions on, Vol. 26, 2014, No 5, pp. 1063-1075.10.1109/TKDE.2013.11
  7. 7. Sood, S., D. Loguinov. Probabilistic Near-Duplicate Detection Using Simhash. - In Proc of 20th ACM International Conference on Information and Knowledge Management, ACM, 2011, pp. 1117-1126.10.1145/2063576.2063737
  8. 8. Zhao, W. L., C. W. Ngo, H. K. Tan et al. Near-Duplicate Keyframe Identification with Interest Point Matching And Pattern Learning. - Multimedia, IEEE Transactions on, Vol. 9, 2007, No 5, pp. 1037-1048.10.1109/TMM.2007.898928
  9. 9. Hajishirzi, H., W. Yih, A. Kolcz. Adaptive Near-Duplicate Detection via Similarity Learning. - In: Proc. of 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2010, pp. 419-426.10.1145/1835449.1835520
  10. 10. Zhao, P., J. Xin, X. Xian et al. Active Learning for Duplicate Record Identification in Deep Web. Foundations of Intelligent Systems. Berlin, Heidelberg, Springer, 2014, pp. 125-134.10.1007/978-3-642-54924-3_12
  11. 11. Xiao, C., W. Wang, X. Lin et al. Efficient Similarity Joins for Near-Duplicate Detection. - ACM Transactions on Database Systems (TODS), Vol. 36, 2011, No 3, p. 15.10.1145/2000824.2000825
  12. 12. He, B., K. C.-C. Chang. Making Holistic Schema Matching Robust: An Ensemble Approach. - KDD, 2005, pp. 429-43810.1145/1081870.1081920
  13. 13. Fellegi, I. P., A. B. Sunter. A Theory for Record Linkage. - Journal of the American Statistical Association, Vol. 64, December 1969, No 328, pp. 1183-1210.10.1080/01621459.1969.10501049
  14. 14. Newcombe, H. B., J. M. Kennedy, S. J. Axford, A. P. James. Automatic Linkage of Vital Records. - Science, Vol. 130, October 1959, No 3381, pp. 954-959.10.1126/science.130.3381.954
  15. 15. Jaro, M. A. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida. - Journal of the American Statistical Association, Vol. 84, June 1989, No 406, pp. 414-420.10.1080/01621459.1989.10478785
  16. 16. Dempster, A., N. Laird, D. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. - Journal of the Royal Statistical Society, Vol. B, 1977, No 39, pp. 1-38.10.1111/j.2517-6161.1977.tb01600.x
  17. 17. Winkler, W. E. Improved Decision Rules in the Felligi-Sunter Model of Record Linkage. Technical Report Statistical Research Report Series RR93/12, U.S. Bureau of the Census, Washington, D.C., 1993.
  18. 18. Cochinwala, M., V. Kurien et al. Improving Generalization with Active Learning. - Information Sciences, Vol. 137, September 2001, No 1-4, pp. 1-15.10.1016/S0020-0255(00)00070-0
  19. 19. Breiman, L., J. Friedman et al. Classification and Regression Trees. CRC Press, July 1984.
  20. 20. Hastie, T., R. Tibshirani, J. Friedman. The Elements of Statistical Learning. - Springer Verlag, August 2001.10.1007/978-0-387-21606-5
  21. 21. Bilenko, M., R. Mooney et al. Adaptive Name Matching in Information Integration. - IEEE Intelligent Systems, Vol. 18, 2003, No 5, pp. 16-23.10.1109/MIS.2003.1234765
  22. 22. Chang, K. C., B. He, C. Li, M. Patel, Z. Zhang. Structured Databases on the Web: Observations and Implications. - SIGMOD Record, Vol. 33, 2004, No 3, pp. 61-70.10.1145/1031570.1031584
  23. 23. Cohen, W., J. Richman. Learning to Match and Cluster Large High-Dimensional Data Sets for Data Integration. - In Proc. of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.10.1145/775047.775116
  24. 24. Mc Callum, A., B. Wellner. Conditional Models of Identity Uncertainty with Application to Noun Coreference. - In: Proc. of Advances in Neural Information Processing Systems (NIPS’2004), 2004.
  25. 25. Xiao, C., W. Wang, X. Lin et al. Efficient Similarity Joins for Near-Duplicate Detection. - ACM Transactions on Database Systems (TODS), Vol. 36, 2011, No 3, p. 15.10.1145/2000824.2000825
  26. 26. Tejada, S., C. Knoblock, S. Minton. Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification. - In: Proc. of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.10.1145/775047.775099
  27. 27. Rohit, A., S. Chaudhuri, V. Ganti. Eliminating Fuzzy Duplicates in Data Warehouses. - In: Proc. of 28th International Conference on Very Large Databases, 2002.
  28. 28. Guha, S., N. Koudas et al. Merging the Results of Approximate Match Operations. - In: Proc. of 30th International Conference on Very Large Databases, 2004, pp. 636-647.10.1016/B978-012088469-8.50057-7
  29. 29. Chaudhuri, S., V. Ganti, R. Motwani. Robust Identification of Fuzzy Duplicates. - In: Proc. of 21st IEEE International Conference on Data Engineering (ICDE’2005), 2005, pp. 865-876.
  30. 30. Christen, P. A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication. - IEEE Transactions on Knowledge and Data Engineering, Vol. 24, 2012, No 9, pp. 1537-1555. 10.1109/TKDE.2011.127
DOI: https://doi.org/10.1515/cait-2016-0028 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702
Language: English
Page range: 160 - 178
Published on: Jun 22, 2016
Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2016 Wei Liu, Jianxun Zeng, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.